FriendlyFixer here. I’d like to talk to you about the recent deletion of player mail and I’m hoping you will permit me to get a little technical. I’ll even throw in a joke if I can make it work. With that said the format for this post is a MUST READ section, and a MAYBE READ section. The joke will be somewhere in there.
- Player mail data size has become VERY big due to a bug in the deletion system
- We needed to delete some of it
- Despite our efforts, some of the newest mail messages were deleted as well
- We’ve processed about a third of the messages and will need to complete the rest
First, let me explain some things so we have some shared terms we can use. Rise of Firstborn uses many different technologies but one is a MySQL database. For those not familiar, it is a way to store different bits of data that can be related to each other. Now, in that Database there are a couple of things we need to care about for this story. PLAYER_MAIL and PLAYER_MAIL_CONTENTS. How do these matter? Let’s get Technical.
PLAYER_MAIL is a bunch of data that loosely resembles the envelope for a real world piece of mail. It has some data that says who sent it, what type of mail it is and who it was sent to. It also has a date for when it was sent and a few other pieces. Lastly the envelope MAY have a pointer to a PLAYER_MAIL_CONTENTS, as a form of lookup receipt.
PLAYER_MAIL_CONTENTS are quite simplye collections of data. There is an id number, a binary blob of data, and a number used to identify a link group (url in the mail, not important).
Ok, that seems simple enough. So why did we need to delete some of it? And why is that actually a hard problem? Well… it comes down to HOW you relate data in a database. This is done through KEYS. These are simple number references used to link tables together. However, to make it easy they are INDEXED. Remember school libraries and index cards? No? Maybe I’m old. In any case, they are just a simple way to convert from a bunch of data to a simple number in order to find something quickly. You can think of them as like a filing cabinet that has index cards in it that POINT to a piece of data. Except that data isn’t in the room with you, it’s in a warehouse across the street.
Pretend you are the game server. Someone wants you to get mail for a player? First you get all mail records (envelopes) for the player. This is easy because you have an index for them based on server and player id. Then for each mail you check the envelope to see if there is a content bit for it. That will be a key that tells you which warehouse to go look in, and what shelf and box the contents are in. Great! You can find that content and bundle it together to deliver to the player. Job done.
But what if your warehouse is running out of room? What do you do? In an ideal world you’d simply throw some mail away, right? Well I’m happy to report that is exactly what RFB does! Job done!
Uh oh. The warehouse. We forgot about the warehouse. It still has all those boxes of contents sitting on shelves waiting for someone to come look at them. Too bad the envelopes are all gone and burned!
Ok, let me get back to being serious here for a minute. The reality is there exists a delete mechanism in the game. Whenever you mark a mail for delete, or when the mail has existed for longer than a set amount of time, it is put into an archive. After being in the archive for another set amount of time it is deleted for good. This delete routine is supposed to also delete the contents. Sadly it has not been doing that for an unknown amount of time. As a result all three of the main game databases have HUNDREDS of gigabytes of data for old and orphaned player mail contents.
Why is it orphaned? The problem is that there is NO back reference from the contents warehouse to the envelope. In the warehouse you have rows and rows of boxes with bits of data, but no KEY pointing back at the envelope it came from.
Alright now back to what happened. We need to recover some space on the database discs. The larger they get means the more that the database needs to hold in memory, this can put pressure on other queries and make things slower. So to correct this problem we wrote a script.
The script was designed to do a few things. It scanned through all of the player mail items in large batches (100k at a time) and stored all of the content ids found. It also kept the first and last content id it found. Then it used those content ids as lower and upper bounds for a different search in the contents table, again in large batches. The script finally compared those content ids with the ones found earlier and deleted any that did not appear.
In testing this worked perfectly. We then duplicated one of the live databases and tested it on that data set. This was slightly less perfect, but still really good. And importantly it was able to complete in less than a day and without impacting the game’s performance. So where did it go wrong?
In the script we made an assumption. We assumed the IDs used for player mail always increased consistently inside the player mail envelopes. This is MOSTLY true, but does get broken in a few cases. It is these cases that led to some contents being deleted even though the envelope still existed. In our testing and validation it was less than 1% of all mail, but it was also heavily weighted towards NEW mail.
Alright, what’s next? We have run this script on one of the three databases and will be running it on the other two. Doing this has effectively reduced the one database table from 300 gigabytes of data down to 40. We need to do it on the other two as well. We ask your tolerance for any deleted mail messages. I can only hope that this OVERLY LONG explanation has served to show that even seemingly simple things can be very complicated in reality. I also secretly hope it was mildly entertaining.
A joke. Oh right. My son asked me to put his shoes on. I don’t think they fit me.