It turns out that those Database Upgrades announced for today for Butcherblock, Nagafen, and Lucan D’Lere were in fact complete replacements of the server hardware with 64-bit servers. This is very exciting news, as this has the potential to solve a lot of lag and performance problems. However all has not gone according to plan.
Rothgar posted yesterday:
All three (Butcherblock, Nagafen, Lucan D’Lere) are getting new hardware, latest DB version and upgrade from 32 to 64 bit.
Rothgar reported early today that Nagafen was back in operation:
The server is back up and you have new database hardware. This should help a lot with database heavy tasks such as zoning, broker transactions, etc.
Keep in mind that all we upgraded at this time was the database. This is a big improvement but its not going to be a fix across the board for all types of lag. We’ll continue to look at other lag improvements for these issues though.
Gunthore, a player on Butcherblock, pointed out that Butcherblock server is still down.
The server (Butcherblock) did come up on schedule, but people were having problems logging into some characters, so they locked the server, but people were still able to play who were logged in. I logged out around 11 or so.
Rothgar is currently using this thread to provide updates on the situation:
We’re getting these reports from the other servers as well, so it looks like there is an issue with some characters. We’re looking into it now to determine if it was a data-transfer problem or a code problem related to the new DB version.
As soon as we know more we’ll let you know the course of action. Best case, we make some tweaks on our side and those characters will be accessible. Worst case, we’ll have to take the servers down and re-copy data from the old database.
Continued:
We’re going to go ahead and lock all 3 servers, Nagafen, Butcherblock and Lucan D’lere until we have the issue resolved. This is just a precautionary measure should we have to roll-back to the data from about an hour ago. We don’t want you to lose more time.
I’ll keep this thread updated when I have more info.
Unfortunately, today’s attempt to upgrade Nagafen, Butcherblock, and Lucan D’Lere was unsuccessful and SoE are now in the process of getting us switched back to the old 32-bit hardware in use before the upgrade. Unfortunately, some folks were able to play on the upgraded servers for as much as 4 hours. Any progress/loot acquired during this time will be lost. Here’s Rothgar‘s announcement:
Hi everyone. We’ve found the problem and have a plan of action in place.
What was the problem?
The process of copying the data from the old database to the new database had issues with specific characters. Mainly these were older characters with large amounts of data associated with them. All of their data didn’t copy correctly which caused them to not be able to load. The same behavior occured with other database record types, such as housing records, where their associated data was very large. This caused some house zones to fail to load as well. This is why some players received different error messages. You might have been attempting to log in a character that was fine, but he was camped in a house with a problem.
What is the solution?
Our database engineers will need a little more time to find out why the migration process had issues with these specific records. Once that issue is solved, we will perform another migration and test these characters specifically to ensure that problem is fixed. We don’t have an ETA at the moment, but need to be prepared for this to take a couple of days.
What do we do in the meantime?
In order to get everyone back online again, we are going to revert back to using the old database which is still 100% intact. The data in the old database was frozen at the time we went down for maintenance. So this means anything you did early this morning will be rolled back. We’re very sorry that this had to happen. Unfortunately we can’t risk the time or data by copying backwards from the new server to the old server.
How do we keep this from happening in the future?
We are documenting our testing processes to be sure to include characters and houses that meet the critiera that caused this problem in the first place. In the future we will have specific test cases that will keep this problem from occuring again.
How does this affect the Venekor -> Nagafen Merge?
If we can repair the migration process quickly, we hope to not impact that timeline. However we must also plan for the possibility that the delayed database migration might cause us to delay the merge for a few days. Hopefully it will not come to that. We’re all very excited about getting you on the new hardware and completing the merge as soon as possible, so we are doing everything we can to make that happen.
So again, I apologize for the rollback and the downtime associated with this maintenance. We’re starting the process of reactivating the old database servers and hope to have you back up in an hour or two. As soon as we’ve got the issue with the migration process fixed we will post a revised timeline for the new maintenance.
Commentary
There have been some complaints that here are these great 64-bit servers in the pipeline, and that it would be rolled out very slowly. First server to be upgraded was Nagafen, then if it proved successful, Venekor would be merged into Nagafan and then we’d start to see this technology roll out to other servers, eventually reaching the server that most desparately needs this upgrade — Antonia Bayle.
So it’s very unfortunate that today’s attempt to get 3 servers rolled onto 64-bit was a bust. I remain patient and optimistic that in the next few days we’ll see a successful upgrade to 64-bit.
Excellent work Mr. Feldon. 🙂
That’s exremely unprofessional work on SOE’s side. As somebody taking care of databases myself, how hard could it have been to grab a backup (which SOE does anyway daily) from one of those servers, and, I dunno, like, effing test the migration? Not with a few select players, but with the entire backup? Jesus Christ.
oh i’m sure it could of been a cinch copying all that data over and trying it on a new server. but how are you going to test that server under heavy load except by taking it live and allowing thousands of users to start using it?
as somebody taking care of databases yourself should know. some situations are impossible to test for… you just have to push it to your live servers and go from there.
if you read rothgar’s explanations above, a number of old accounts with deprecated housing data caused the issues. your mentioning of heavy load is entirely nonsensical for this issue because the MIGRATION failed. there wasnt a single user online during the migration, i.e. there was no load at all.
when is butcherblock coming back up got stuff to do