- Joined
- Oct 2, 1998
- Messages
- 15,223
Ok, to recap what happened:
2300HRS 20MAR04 - I called our server techs to backup our database prior to my upgrading the forums software. I proceeded when the backup was complete. Unfortunately, our server MySQL backend wasn't running the necessary current version, causing an upgrade failure. We attempted to restore the backup, and found out that the backup (quite frankly) didn't.
0200HRS 21MAR04 - After 3 hours of trying to fix the database, we decide to restore from an older backup (when the new RAM was installed).
2000HRS 21MAR04 - Forums come back up. Over the next 2 days, I rebuild styles, reactivate members, get search engine rebuilt, etc etc etc. Start long road to recovery.
0348HRS 24MAR04 - Forums database server suffers a massive failure, wiping out the entire RAID array, and all on server backups.
0900HRS 24MAR04 - We discover the server went down and start recovery. All available data is transferred to other boxes in the server farm for later analysis and recovery. Replacement server ordered.
1200HRS 25MAR04 - Replacement server arrives and is taken to the datacenter. We begin recovery process - operating system installed, RAID Array built, etc.
1800HRS 25MAR04 - Recovered data is discovered to be entirely corrupted. We start back again with our 20FEB backup and begin multi-hour upgrade & rebuild process
2400HRS 25MAR04 - Success. Forums come online. Forums database is backed up again on separate server so that if there is another failure, we don't have to waste another 6 hours on upgrading.
Our top priority now is restoring access to the site, which should have happened by the time you read this. At this time, SEARCHES will be disabled until we can fully rebuild the search index. That is a background process and should take no more than 48 hours.
Our next step will be to restore all paying members to their appropriate status levels, which will take until 1800HRS tomorrow (26MAR04) to complete. This will be followed by restoring custom titles.
Following this, work will begin on customization of viewing styles to get the forums looking more like "normal". Work will begin on new forums that will be added to the site. Password protected forums and higher membership levels will be enacted. Additional features will come online.
This has been an extremely trying time for us. We appreciate your patience through the outage. At this time, here is the current status of the forums:
Forums are online, however, all data created after 20FEB04 has been lost and is irretrievable - posts, memberships, attachments, etc.
The original corrupted database which we were hoping to recover was wiped out when the RAID Array failed on the database server.
All members who joined after 20FEB04 will have to rejoin AGAIN. As previously stated, we will be reinstating appropriate memberships shortly.
I had a very long talk with our tech support about our backup options and we now have a much better recovery plan in effect. Our new database server is also better setup for disaster recovery.
You should also notice a massive speed increase - we are operating on dual 3.0 ghz processors with 4gB of RAM. Storage is a RAID-5 array on multiple 30gB SCSI drives (10,000) RPM. Since we have roughly double the RAM of the size of our database, things should be smoking.
At this time, I've pretty much beat. The biggest hurdles have been cleared though, and we are well on our way to getting back to normal. Thank you for your patience and support througout this ordeal.
2300HRS 20MAR04 - I called our server techs to backup our database prior to my upgrading the forums software. I proceeded when the backup was complete. Unfortunately, our server MySQL backend wasn't running the necessary current version, causing an upgrade failure. We attempted to restore the backup, and found out that the backup (quite frankly) didn't.
0200HRS 21MAR04 - After 3 hours of trying to fix the database, we decide to restore from an older backup (when the new RAM was installed).
2000HRS 21MAR04 - Forums come back up. Over the next 2 days, I rebuild styles, reactivate members, get search engine rebuilt, etc etc etc. Start long road to recovery.
0348HRS 24MAR04 - Forums database server suffers a massive failure, wiping out the entire RAID array, and all on server backups.
0900HRS 24MAR04 - We discover the server went down and start recovery. All available data is transferred to other boxes in the server farm for later analysis and recovery. Replacement server ordered.
1200HRS 25MAR04 - Replacement server arrives and is taken to the datacenter. We begin recovery process - operating system installed, RAID Array built, etc.
1800HRS 25MAR04 - Recovered data is discovered to be entirely corrupted. We start back again with our 20FEB backup and begin multi-hour upgrade & rebuild process
2400HRS 25MAR04 - Success. Forums come online. Forums database is backed up again on separate server so that if there is another failure, we don't have to waste another 6 hours on upgrading.
Our top priority now is restoring access to the site, which should have happened by the time you read this. At this time, SEARCHES will be disabled until we can fully rebuild the search index. That is a background process and should take no more than 48 hours.
Our next step will be to restore all paying members to their appropriate status levels, which will take until 1800HRS tomorrow (26MAR04) to complete. This will be followed by restoring custom titles.
Following this, work will begin on customization of viewing styles to get the forums looking more like "normal". Work will begin on new forums that will be added to the site. Password protected forums and higher membership levels will be enacted. Additional features will come online.
This has been an extremely trying time for us. We appreciate your patience through the outage. At this time, here is the current status of the forums:
Forums are online, however, all data created after 20FEB04 has been lost and is irretrievable - posts, memberships, attachments, etc.
The original corrupted database which we were hoping to recover was wiped out when the RAID Array failed on the database server.
All members who joined after 20FEB04 will have to rejoin AGAIN. As previously stated, we will be reinstating appropriate memberships shortly.
I had a very long talk with our tech support about our backup options and we now have a much better recovery plan in effect. Our new database server is also better setup for disaster recovery.
You should also notice a massive speed increase - we are operating on dual 3.0 ghz processors with 4gB of RAM. Storage is a RAID-5 array on multiple 30gB SCSI drives (10,000) RPM. Since we have roughly double the RAM of the size of our database, things should be smoking.
At this time, I've pretty much beat. The biggest hurdles have been cleared though, and we are well on our way to getting back to normal. Thank you for your patience and support througout this ordeal.