Thursday, August 15, 2013

Rebuilding bad bitcask partitions in Riak

We had one of our Riak nodes that in a couple weeks time started eating up all it's disk recently.

Here's what we noticed:

  1. Two of the twenty or so partitions on the node were 5 to 10 x the average size of the other partitions. The average partitions size was between 20 and 30 GB, and yet 2 of the partitions were 160GB and 210GB.
  2. The logs showed that we had run out of open files even though we have the riak user set for max_open_files at 100k.  As it turns out, during the hardware maint I had started riak from a sudo -i session which gave the shell the default 1024 max_open_files setting.
  3. After restarting Riak with the correct max_open_files setting, we noticed a lot of 0 byte bitcask files which we removed, as well as some invalid bitcask hint files which we cleaned up.
  4. Once all the invalid bitcask files were cleaned up, we realized that any merge process against the 2 large partitions always failed, implying there were some corrupt bitcask files, or the merge process was timing out.
  5. Rather than rebuild the whole node, we decided just to rebuild the specific partitions.

Here's the process we used for rebuilding the specific partitions:

- Stop Riak
# riak stop
- Move the bad partitions elsewhere for backup purposes
# mv /{riak_datadir}/riak/bitcask/{partition} /{backup_dir}/
- Start Riak
# riak start
- Wait for riak_kv process to start
# riak-admin wait-for-service riak_kv riak@{riak_node_name}
- Attach to riak and start the repair process
# riak attach
(riak@{node}) 1> Partitions =  [{part 1},{part 2},...{part n}].
(riak@{node}) 2> [riak_kv_vnode:repair(P) 
|| P <- Partitions].

Note: to quit the riak attach shell, use cntl-D, not cntl-C  (otherwise you will stop riak)

- Check status of the repair process
# riak-admin transfers

All in all, Riak recovers quite nicely, and it wasn't terribly difficult to find out what was going on.

On a side note, Basho does a great job if you have the benefit of using their support .

Thursday, July 11, 2013

By all means, learn from my mistakes as a DBA!

Here are 3 recent ' oops... wish I hadn't done that :/ ' mistakes I've made since joining moz that you might as well avoid (I'm sure there will be more, but they better not be the same)

Reviewing config files for MySQL, but not all of the defaults 

    We recently migrated a few MySQL databases to a new datacenter, and took advantage of the migration to upgrade the MySQL version(s) at the same time.  Of course I made sure to have another DBA review all my new config changes that were deprecated or new that we wanted take advantage of.  After things were migrated and the new applications were tested, things seemed fine - just a few bugs here and there, primarily around character set issues ( which, obviously, would be a dev problem, right? ).

Oops...  the default character set for MySQL is not utf8 !

     Lesson Learned:  

Don't just check that the configs have the correct settings; check for settings that are not there (defaults), as they are the ones that may be great for most, but perhaps not so great for you.

Restricting the # connections allowed for a user

    In order to introduce a quick 'throttling' mechanism for a job based system with a MySQL backend we implemented user resource restrictions.  It worked like a charm in staging, so we rolled it out to prod; as soon as the job connections exceeded 30 they were disconnected and had to wait till there other jobs completed, and all the devs had to do was change the user name for the job connection url.  Sweet!

   Later that day we started getting host denied errors...

Oops...  max_connection_errors matters !

     Lesson Learned:  

Make sure to take time to understand secondary effects of changes over time, as often issues don't show up immediately   That's why 'soak' or 'bake time' is not something that should be overlooked when implementing changes.

Upgrading percona toolkit and with Capistrano

    Awesome!  I got my Capistrano scripts working great - it's time to upgrade our percona toolkit on all our servers, so let's start with all our slave instances first ( it is production... ).  What's cool it with cap I can deploy it to all the slaves at once.

   What's not so awesome is I can also upgrade and restart mysql on all the slaves at once too :/

Oops...   'apt-get -y upgrade' percona-toolkit still upgrades everything, including MySQL !

     Lesson Learned:  

Make sure staging / test systems exactly mimic production when verifying deployment changes, and if you're working on a different distribution of Linux or MySQL than you are used to make sure to read the man pages!  ( apt-get install -y percona-toolkit is what I wanted, and what the man page said to do )

Yes, even those of us that have been around the block a few times still make mistakes .... 

Friday, May 17, 2013

MySQL binlogs - Don't forget to do your homework!

Now that I'm back doing just database stuff, I've come to realize I've gotten a little sloppy about doing my homework.  Homework's never been my favorite thing in the world, but it often reduces stress when your under the gun during an outage or upgrade...

We had a MySQL database server that's been slow on DML changes, and based on the slowest statements being 'COMMIT', we had a good mind it was an issue with the binary logs, or the disk subsystem that MySQL was writing the binary logs to.

Investigation with iostat (iostat -xtc 10), we were able to narrow it down to a disk subsystem being fully utilized and queuing.  

Easy fix right?  Just change the location of the binlogs via the startup option in my.cnf and restart.

If I had done my homework, I would have known that restarting MySQL with a new location for binlogs would reset the binary log to mysql-bin.000001, and thus break replication to the slave... Oops.

No need to panic - just check the master's log position and run the following on the slave:

 mysql> change master to master_log_file=mysql-bin.000001, master_log_pos=123
Perfect - the slave caught up and was good to go - except that I neglected to recall the fact we had enabled the application and started writing to the master before I checked the log position....  Oops. (2)

Luckily, replication was up to date when we restarted, so I was able to use the handy dandy mysqldump utility to load all the transactions from the new binary log up to the point in which I checked the master:

mysqlbinlog --stop-position=123 mysql-bin.000001 | mysql   

Moral of the story: Doing your homework is usually a good thing!