Sciencemadness Discussion Board

Download an open forum backup!

Polverone - 29-4-2005 at 10:41

Sciencemadness.org is now offering for the first time what I hope will become a standard fixture in online chemistry communities: an open, freely available offline archive of messages and files from this discussion board. This is only a first release, and it has a few glitches and rough edges to work out. Here's what it already has to offer:

-An archive of static HTML copies of forum index pages from all sections but Whimsy and Detritus (Whimsy may be added at a later date)

-An archive of static HTML copies of all threads from all sections but Whimsy and Detritus (Whimsy may be added at a later date)

-Modifications to make the static HTML indices refer to the static HTML threads, and to make all static HTML pages use local copies of graphics files and attachments

-An optional media archive containing copies of all attachments and inline images from threads

Here's some of the rough edges that I hope to address in the future:

-User-added links from one thread to another still refer to the online site, not the offline archive; there are also other opportunities to rewrite links for local use

-There's considerable page clutter that I can and should remove before the next release; there's no use for Post New Topic, Today's Posts, etc. in an offline archive

-A disturbingly large fraction of attachments seemed to download with errors; I'm not sure if this is a problem with my archiving software, the board, or the original uploads

-All threads appear as single HTML files; this is taxing to browsers/computers on large threads

The base archive, containing board icons/graphics and threads, can be found here: http://www.sciencemadness.org/archive/sm_main.zip, 21,840,110 bytes.

The media archive, containing inline images and attachments, can be found here: http://www.sciencemadness.org/archive/sm_media.zip, 92,212,812 bytes.

The media archive is still uploading from my home machine, so I would suggest waiting a couple of hours before attempting to download it. After you have downloaded the main archive or both archives, unzip them and point your web browser at index.html to begin enjoying your offline copy of the forum. Depending on how heavily people download these files, I may make them available all the time, or for only a limited time window near the end of each month.

I will continue to upload encrypted database dumps from time to time, since those are easier to use for board recovery, but this is the archive you want to download if you've ever feared losing something from the forum, or if you want to refer to it even when you're not on the internet, or if you'd like a local copy to search/analyze/whatever.

Please let me know of any glitches you encounter or enhancements you'd like to see in this thread. Enjoy!

chemoleo - 29-4-2005 at 18:03

Thank you very much Polverone! I see all this heavy bandwidth is being put to *good* use!


I tested it out a little.
I guess the major problem I could see is that the linkage is relative to *your* computer system. I.e. when I load up index.html, and load up a subforum, to click on a thread, it produces i.e. this link file:///extra/sciencegrab/chemistry_in_general/0000023.html
This is of course not the directory I installed this into, so clicking this link produces nothing. But it shouldnt be a problem to fix as the links between index.html and the different subdirectories work fine.

Very nice work otherwise, btw. I wonder how you combined several paged threads into a single html.

Polverone - 29-4-2005 at 19:52

Ugh, you're right. I obviously made a typo in preparing that section. I have uploaded a zip file with just the fixed index information:

http://www.sciencemadness.org/archive/fixedindices.zip

I am also uploading a fixed version of the sm_main archive. Edit: the fixed version is now in place. Anyone who downloads sm_main.zip now should get the correct index information.

It was very easy to make the threads into one long piece: I created a new user account named archiver, went to my control panel, and had it show 1000 posts per page (longer than any existing thread). Then I just had my script log in as archiver and grab each of the one-page threads.

I must thank everyone who contributed financially to sciencemadness. This sort of project would not have been possible on the old site, due to the much more limited bandwidth and disk space.

[Edited on 4-30-2005 by Polverone]

Ramiel - 30-4-2005 at 06:25

Both backups downloaded.

The_Davster - 30-4-2005 at 09:43

I downloaded both of them, but when I attempt to extract the media zip file I get several errors and the file into which I extracted them is empty. Anyone else have this problem?

Backups are a great idea !

Rosco Bodine - 30-4-2005 at 11:19

In these times of troubling disappearances of websites and data
and discussions , particularly of obscure
or not well known information of the nature which makes such knowledge

" SENSITIVE IN NATURE " ........

then under such circumstances , the free distribution of such information spread far and wide is an effective countermeasure for the censors and " thought police " who
are doing their tyrannical best to keep people ignorant subjects whose extent of
knowledge is limited only to what they are
deemed " authorized " to know .

Any small victory against those Orwellian ,
Machiavellian fascists , is a worthy accomplishment .

And that is the larger matter which should govern us all in these times , seeing what
has happened with the hive , and the direction things seem to be going for E&W also , the priority should be to preserve hard gotten data assembled in such ways
nowhere else on earth , and guarantee that informations continued availability ,
as much so as if it were Winchesters being
passed out to the pioneers , as they circle the wagons and see what the savages are going to do to interfere with progress .

Polverone - 30-4-2005 at 13:29

Hi rogue chemist, I just tested downloading the media file and unzipping it. I had no problems. Are you sure the file you downloaded is exactly 92,212,812 bytes in size? Its md5 checksum is a109745ae876bdff3a9273b1b01ed225.

[Edited on 4-30-2005 by Polverone]

The_Davster - 30-4-2005 at 14:13

I tried re-downloading it, the first time my wireless connection died on me which could have caused some corruption. In any case it works fine now, I still get a few errors during unzipping, but the files work now, so all is good.:cool:
Now to save this archive to disk.

Rosco Bodine - 1-5-2005 at 18:52

Dowloaded the backup quick and easy ,
and got no errors on decompressing the zip files .

Everything appears to work perfectly ,
navigation and page loading is instantaneous ....

Never seen the forum work so fast :D

Nothing like a data drive for a local file server , and it would probably be quick
even on a CD .

Oh , just a reminder to anybody having any problems , it can be a firewall glitch on your local machine , being spoofed by explorer activity and blocking the unrecognized activity which may be blocked as suspect " traffic " . If you
have any trouble check your firewall allow settings or turn off filtering .

chemoleo - 2-5-2005 at 07:33

Works great here, too, including attachments and pictures!
Even pictures stored elsewhere were grabbed, which is great because once those sites go down this data isn't irretrievably losts!

One minor issue - I noticed that, once browsing in actual threads, trying to go back by clicking Sciencemadness Discussion Board (file:///.../sciencemadness%20html/sciencegrab/energetic_materials/index.php), or Organic chemistry (i.e. the forum, file:///.../sciencemadness%20html/sciencegrab/organic_chemistry/forumdisplay.php?fid=10) or whatever doesn't work - the file is not found, so internal crossreferencing by board-links is seemingly not applied to all internal links.
Essentially this can be avoided by using the backbutton of course.
Maybe there's an easy fix for this. Although it's not essential, so all is good.

Polverone - 2-5-2005 at 10:42

It's correct that I made no effort to fix those additional links. I will add that fix in a future release.

Rosco Bodine - 2-5-2005 at 13:58

One feature I would like to see enabled
is the " printable version " view , since
that makes it much easier to capture
and export any text .

Saves a lot of ink when you want to print
something too .

[Edited on 2-5-2005 by Rosco Bodine]

Backups

MadHatter - 2-5-2005 at 22:30

Both backups downloaded. Thanks, Polverone !

Polverone - 13-5-2005 at 00:48

New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time. Links have been improved in this version. The visual appearance has been cleaned up too. Finally, the archive now includes printable versions of the threads.

One oddity that you may notice with this archive or the last is that the index pages can be slightly more up to date than the actual threads. For example, a thread may be listed as having three replies but you only see one when you click on the thread. This is a bit of a wart but not actually a bug; it's due to the way the archiver caches threads but always downloads fresh index pages.

Axt - 14-5-2005 at 02:05

One of the things I think would improve the search capabilities is using the topic title as the "page title".

For example if one searches through windows, or google for a word, the page is always named "Sciencemadness Discussion Board - Powered by XMB 1.8 Partagium Final S..". On other forums the topic title becomes part of the "page title" so its easy to identify threads.

Another example, look at the top title bar of the page <a href="http://www.sciencemadness.org/talk/viewthread.php?tid=3295">here</a>, compared to <a href="http://www.xsorbit2.com/users/apcforum/index.cgi?board=general&action=display&num=1095230413">here</a>.

Since the search function doesnt work in the archive, this makes it hard to use "3rd party" search engines on the archive, as they only pull up the page title.

Other then that .... great :D

Polverone - 26-7-2005 at 21:18

New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time.

Polverone - 3-12-2005 at 18:16

New backups are now ready for download under the same names as before, sm_main.zip and sm_media.zip. There is unfortunately no easy way to offer incremental updates at the present time.

Later I'm going to try to take the forum down for a bit while I upgrade the software, so you might want to take the opportunity to download an offline copy now.

[Edited on 12-4-2005 by Polverone]

Backups

MadHatter - 4-12-2005 at 07:38

Both now in the UPLOAD folder on my FTP.

Nerro - 4-12-2005 at 10:58

This is a quick reply :D

wa gwan - 19-8-2006 at 12:48

Is there another backup coming soon?

Rosco Bodine - 25-1-2007 at 08:31

Yesterday I noticed some error message script superimposed on the image of the main page
and a few glitches otherwise which were a transient
problem ...... and remembering some connectivity
problems not too long ago the two things made
me a bit nervous and caused me to wonder about
how up to date is the present backup .

There's been a lot of interesting information and discussion added since the last known backup which really should be protected , archived data , secured
by an up to date backup .

So please .....at the earliest opportunity ,
let's get an updated backup . It's cheap insurance
and peace of mind .

solo - 25-1-2007 at 08:40

I have a question do all the articles that have been uploaded become part of the back up? Also is there a file folder where all of the uploaded articles reside .......and can they be accessed by members? the reason for asking is because there is an awful lot of citations being uploaded and no way to see and index of what's available.....at WD I keep a folder for all the references ever requested and fulfilled and their upload link available for future researchers also to avoid reinventing the wheel...........solo

Polverone - 25-1-2007 at 20:14

I do like having the cheap insurance of a backup, but doing the sort of transformations that are necessary to make the offline archive presentable and navigable is a bit painful. In case someone with relevant programming experience is reading: I am using the Python module BeautifulSoup to locate elements in each page (e.g. the "New Topic" button) and then I do string replacements to delete or alter elements (operating on an entire page as one large string). The problem is that the strings returned by BeautifulSoup may not be exactly the same sequence of characters that appeared in the web page -- whitespace may be changed. Each one of these discrepancies must have a special case in the code, which is ugly and time-consuming to develop. I did it once, but several months later the forum software was upgraded and the work needed to be done again. I still haven't re-done this work.

Solo: yes, attachments are downloaded and stored by the code. You would still need an indexing system to go with them, because the file names may be something uninformative like "068374_methanol.pdf", where the numerical prefix is the number of the post that the attachment was found in.

Waffles - 28-1-2007 at 11:25

Quote:
Originally posted by prica
X > B.D.
I cane Send vaglias,but if ya came doon this parts, you're my guest(x2 p. 2weeks).Augh !


WHY DO THESE PEOPLE THINK THAT WE UNDERSTAND THEM

THIS IS NOT LANGUAGE

gambler - 31-3-2007 at 18:46

Is there plans in the mist to prepare a current open forum backup?
Thankyou in advance

new backup available

Polverone - 9-4-2007 at 08:54

The first new backup since December 2005 is now available. As usual, it comes in two parts:
sm_main.zip contains the essential elements (HTML and a few frequently used images) and occupies 69,672,886 bytes.

sm_media.zip contains copies of attachments and images that appeared inline within posts. It occupies 444,244,132 bytes.

Some threads appear to have a sort of display glitch at the end, probably because the HTML hasn't been cleanly manipulated. Please let me know if you encounter other problems.

Zinc - 9-4-2007 at 14:08

I can't download any of them.

Polverone - 9-4-2007 at 14:37

The downloads work fine for me. What error do you get, and what browser/OS are you using?

solo - 9-4-2007 at 17:20

Quote:

sm_media.zip contains copies of attachments and images that appeared inline within posts. It occupies 444,244,132 bytes.


.....does this means all the articles uploaded and posted are enclosed? .......solo

[Edited on 9-4-2007 by solo]

Polverone - 9-4-2007 at 17:38

Solo, only attachments that appear in the regular public forums are included. There are no posts or attachments from References, Whimsy, or Detritus included. I might be able to do a backup of References too, but that archive would be distributed separately, probably with a link within References.

Zinc - 11-4-2007 at 01:50

Quote:
Originally posted by Polverone
What error do you get, and what browser/OS are you using?


I am using Internet explorer. It says: Internet explorer cannot download sm_media. zip (or sm_ain.zip if I click on it). An unexpected error has occurred.

not_important - 11-4-2007 at 04:03

Quote:
Originally posted by Zinc
Quote:
Originally posted by Polverone
What error do you get, and what browser/OS are you using?


I am using Internet explorer. It says: Internet explorer cannot download sm_media. zip (or sm_ain.zip if I click on it). An unexpected error has occurred.


Don't. Get Opera, or FireFox; both have no trouble downloading the files.

Zinc - 11-4-2007 at 10:36

Quote:
Originally posted by not_important
Don't. Get Opera, or FireFox; both have no trouble downloading the files.


Now it works.


[Edited on 11-4-2007 by Zinc]

Rosco Bodine - 1-3-2008 at 17:30

Any chance of getting a current backup ?
There has been a lot of useful information posted
since last years backup , and it should definitely
be archived and distributed as insurance against
data loss .

MagicJigPipe - 1-3-2008 at 23:49

Yes, so where are the new ones? No pressure or anything...

You know what would be great? A way to automatically update the archive every month or so... So people can update their archives every so often just in case. That would take up a lot of bandwidth though. I doubt whoever hosts this site gives unlimited bandwidth (ironically, I have unlimited bandwidth on my DSL account... That's a max of 60kb/s upstream though. So, at that speed my BW in a month would be... 160GB, not bad. Downspeed would be much better at a max of 350kb/s. That's nearly 1TB a month downstream). If I had a static IP I would be more than willing to host the archives and maybe even a few articles and whatnot. It would have to be limited to 20kb/s to 2 people at a time though. Anyway, just pipe dreams.


[Edited on 2-3-2008 by MagicJigPipe]

Polverone - 22-3-2008 at 08:49

The first new backup in almost a year is now available. As usual, it comes in two parts:
sm_main.zip contains the essential elements (HTML and a few frequently used images) and occupies 91,369,901 bytes.

sm_media.zip contains copies of attachments and images that appeared inline within posts. It occupies 721,167,841 bytes.