Do we need a historian? It's pruning time.

Khukuri Monster said:
I would have downloaded all the threads older than 3 years but at some point last night my dad shut off my computer without me knowing and it reset the progress.

Sometimes if I have a long process running I put a post-it note on the screen or a piece of paper saying "don't use or shut off" on the keyboard to avoid this type of problem. You might have to put the sticky over the shutdown button on the computer too, depending on how folks around you usually shut down.


Guys on other forums are wrestling with this same problem of archiving content. If any of you get a worikable solution that grabs what we want without excessive bloat please also post your solution at http://www.bladeforums.com/forums/showthread.php?t=360017 , so the other guys can benefit from the work that you have done here.
 
Howard Wallace said:
Sometimes if I have a long process running I put a post-it note on the screen or a piece of paper saying "don't use or shut off" on the keyboard to avoid this type of problem. You might have to put the sticky over the shutdown button on the computer too, depending on how folks around you usually shut down.

Heh, that wasn't really the problem. I had my computer plugged into an outlet that happens to be controlled by a switch on the wall... that is convieniently located right next to the light switch. That's real nice when you are in the dark and are groping to turn on the lights... Two nights ago I accidentally rebooted my own computer that way.

My fault more than anything, but I figured that nobody else would hit the switch since everyone was going to sleep. Must have been wrong.

I just put one of those switch-locking devices on there anyways... so I don't forsee any more unexpected, localized power outages.
 
HT is doing it's thing as I am here at work now...maybe I will be pleasently surprised when I get home and have a total copy of BF. If it works, it should be easy to have it burned to DVDs.

Whoever gets the data...that's the important thing...to preserve what others might only see as chit-chat.

Yeah Bill...this has everything to do with khukuris...

.
 
HT didn't do what I hoped for. It captures everything, but as soon as I went to the archives, the search function pulls off the net instead of locally.

It falls to another...:(

.
 
Nasty said:
as soon as I went to the archives, the search function pulls off the net instead of locally.

No program will be able to totally replicate Bladeforums, in particular, things like the search function.

The search function is entirely dynamic - it's based off the forum software and sends requests to the bladeforums server. The server runs your search and then generate you a web page with the results. So there would be no way to get a working "search page" unless you somehow were able to ask the forum software for every possible combination of search options... then index them somehow. Of course, this is completely impossible.

If you look at the webspider FAQ there is a warning in there that it won't work well on sites with lots of CGI - this stands for "Common Gateway Interface" and basically means "Web pages that are generated by a computer program based on some user input". Of course, practically every page on a forum uses some sort of CGI... so getting a webspider to work is really tricky.

The real question is, did the program download the real content... that is, the threads and posts that will be deleted. If it did, you are OK, you just need to be able to pull out this information and index it somehow.

I would look in the directory that the webspider downloaded to for files with names like "showthread.php?t=" and then some numbers. These will be the important files.
 
No...it didn't. I grok server side and the database structure...but as a systems guy doing Program Management, I have programmers that I pay to do the grunt work.

I was hoping for a miracle...

.
 
Guys, thanks for trying. You'll get it. THere's always a way. Like I said, thanks so much for doing something that most of us don't have a clue how to even begin thinking about doing. You do Uncle Bill's house a service and an honor.

Jake
 
Guys, thanks for trying. You'll get it. THere's always a way. Like I said, thanks so much for doing something that most of us don't have a clue how to even begin thinking about doing. You do Uncle Bill's house a service and an honor.
Jake
What Jake said! :)
 
As we're seeing, it may be difficult to do this just by sucking it off the web because of the search functions and CGI scripts. We may need to seek assistance/cooperation from Sparks on this one.
 
Can't we just use our computer's own file content search
to locate the saved threads?
just like we do thru the Bladeforums computer.


~
~~~~~~~~~
<> THEY call me
'Dean' :)-fYI-fWiW-iIRC-JMO-M2C-YMMV-TiA-YW-GL-HH-HBd-IBSCUtWS-theWotBGUaDUaDUaD
<> Tips <> Baha'i Prayers Links --A--T--H--D
 
As far as I can figure, the stuff we're most interested in (post content) is all stored in the MySQL database. I took a quick look at the vBulletin forums to see if there is an easy way to backup only one of the forums on a site.

It looks like the official answer is that you'd have to get a backup of the entire site, then remove everything but the one forum you wanted:
http://www.vbulletin.com/forum/showthread.php?t=126881

There is an unofficial opinion that you _might_ be able to do this by writing some custom MySQL:
http://www.vbulletin.com/forum/showthread.php?t=147774

I don't know if this is something that someone can do for us, but possibly we could get a dump of the table that contains only the thread records (and associated post and user records) that are related to the HI forum/HI forum archive, restore the tables somewhere, and hack together a (much simpler) read-only front-end? Possibly? The thought of losing all of Bill Oji-chan's old posts makes me very sad. :(
 
Yvsa, how long did Spark give us? I'm worried about losing this history. As NTS said, that is crucial to what we do here...

Thanks,

Norm
 
Norm?

here's the thread: http://www.bladeforums.com/forums/showthread.php?t=360215

And Spark's specific response.

Unfortunately we don't have a CD dump capability. And with the persistance of the cantina to use the forum for chit chat, many of the hard data posts are intermixed and unidentifiable. So we have to seperate the wheat from the chaff.

I don't have a problem waiting for you all though.
__________________
Kevin Jon Schlossberg
Owner, BladeForums.com
Only Sharp Knives are Interesting

So it is a bit vague, but he is trying to work with us, and some other forums.
 
Novadak said:
It looks like the official answer is that you'd have to get a backup of the entire site, then remove everything but the one forum you wanted:
http://www.vbulletin.com/forum/showthread.php?t=126881

This was my original hunch... I don't know mySQL but this is as good a time as any to learn if I try to go that route.

Okay, I have downloaded all threads older than 11-02-2002 from the HI forum onto my computer. I don't know if it is necessary for me to start on the archives - will they be deleted? I would also like to do the ShopTalk forum and the Busse forum. Heck, I would like to do every forum on BF but don't have the bandwidth.

I keep having problems with connection timeouts and bizarre computer restarts. I just don't understand what is going on with my computer here. I'm going to try to put together a version of the script that somebody with a high-speed connection can easily run without too much technical knowledge required.
 
Problem is solved. Somebody else (edit: this was Ted Voorde, see post http://www.bladeforums.com/forums/showpost.php?p=3226804&postcount=19) noticed that the "Archive" actually contains all the threads on BF, except it's a whole lot easier for a computer to access the content. So if you just go to the archive pages with a webspider you will actually download everything.

Okay, so here's how you download the entire bladeforums without any bloat.

Point your webspider to start at: http://www.bladeforums.com/forums/archive/index.php/

Then do one of these things:

1) Check an option in the configuration which says "Only proceed deeper into website" or something similar. This forces the webspider to only download pages which are on the archive, and it won't backtrack and end up at the bladeforums starting page.

2) Add a filter like "http://www.bladeforums.com/forums/archive/index.php/*.html". This will have the exact same effect.

Nasty or ferguson, try running the webspider again. It'll work this time.

Oh, if you only want to download the HI forum and no other, start the webspider at:
http://www.bladeforums.com/forums/archive/index.php/f-739.html
instead, and then add a filter like:
"-http://www.bladeforums.com/forums/archive/index.php/"
"-http://www.bladeforums.com/forums/archive/index.php/f-673.html"
The minuses in front of the address tell the HTTrack program that you don't want to download these pages. It means "Don't go back to the main index page or to the Manufacturers page - just download from this one only."
 
It's doing something...only have time to kick it off...got to get to work. I pointed it at the archives and below.

We'll see...

.
 
Khukuri Monster said:
Okay, I have downloaded all threads older than 11-02-2002 from the HI forum onto my computer. I don't know if it is necessary for me to start on the archives - will they be deleted? I would also like to do the ShopTalk forum and the Busse forum. Heck, I would like to do every forum on BF but don't have the bandwidth.

As I understand it, the HI archives and the Busse forum are also to be pruned. This means the HI archives will dissappear, and we need to back them up if we want them. Shop Talk is safe from pruning, as it is generally recognised that the posts there have enduring value.


Khukuri Monster said:
Problem is solved.

This may be a little premature. The problem will be solved when one of us has a full backup of the HI forum from mid 2002 back and the archives. It sounds like you have a promising approach though. Good work!
 
:( I tried to save some content from the archives but it seems like an almost in-surmountable undertaking. sorry i couldn't help guys !

maniwheel-anim.gif
 
I got almost nothing worth having. Some forums came through, but the HI archives didn't.

Falls to another...

.
 
I think I've got a complete text-only copy
of all three khuk centered forums/archives.
Well under 300Mb total--
so it'll fit on a cd easily.
Took under 3 hr to dl on my dsl connect

Anyone else successful on any front?
It would be nice to know that we have
at least 3 good complete downloads
so nothing falls thru the cracks.

I can burn copies when I get my computer working properly.
[assuming Spark will approve,
which I think he has implicitely,
at least for the deleted material.]
Computer's own file content search works fine to locate
threads containing keywords

anyone wants a copy of my 'httrack' settings;
[based on Kh-Monster's suggestions]
ask, and i'll post them here.
I think I got nothing but thread & specific indexes--
all text only.

Tonight I'll try to re-jig the filters
for the standard forum display with pix.
we'll see.
{add: found a post by Spark saying his bandwidth went thru the roof.
I'll try to figure out a way to avoid dl-ing too much more from the site;
though my text-only shouldn't have been too bad......
& maybe specific times someone can access my copy thru my changing
ip address (but my upload speed is slow compared to dl)
I'd still like to get few of the remaing pix specific to some threads.
fWiW, most older threads don't have pix still attached.}


btw--
Tried to find the HI-forum archive on the -other- place
but no longer there that I can find.
Some bits and pieces of it available thru the
www.waybackmachine.org
but only a couple of dozen threads.......maybe

has anyone any of -those- forum postings saved?



~
~~~~~~~~~
<> THEY call me
'Dean' :)-fYI-fWiW-iIRC-JMO-M2C-YMMV-TiA-YW-GL-HH-HBd-IBSCUtWS-theWotBGUaDUaDUaD
<> Tips <> Baha'i Prayers Links --A--T--H--D
 
Back
Top