Pages

Thursday, August 15, 2013

Rebuilding bad bitcask partitions in Riak


We had one of our Riak nodes that in a couple weeks time started eating up all it's disk recently.

Here's what we noticed:


  1. Two of the twenty or so partitions on the node were 5 to 10 x the average size of the other partitions. The average partitions size was between 20 and 30 GB, and yet 2 of the partitions were 160GB and 210GB.
  2. The logs showed that we had run out of open files even though we have the riak user set for max_open_files at 100k.  As it turns out, during the hardware maint I had started riak from a sudo -i session which gave the shell the default 1024 max_open_files setting.
  3. After restarting Riak with the correct max_open_files setting, we noticed a lot of 0 byte bitcask files which we removed, as well as some invalid bitcask hint files which we cleaned up.
  4. Once all the invalid bitcask files were cleaned up, we realized that any merge process against the 2 large partitions always failed, implying there were some corrupt bitcask files, or the merge process was timing out.
  5. Rather than rebuild the whole node, we decided just to rebuild the specific partitions.

Here's the process we used for rebuilding the specific partitions:

- Stop Riak
# riak stop
- Move the bad partitions elsewhere for backup purposes
# mv /{riak_datadir}/riak/bitcask/{partition} /{backup_dir}/
- Start Riak
# riak start
- Wait for riak_kv process to start
# riak-admin wait-for-service riak_kv riak@{riak_node_name}
- Attach to riak and start the repair process
# riak attach
(riak@{node}) 1> Partitions =  [{part 1},{part 2},...{part n}].
(riak@{node}) 2> [riak_kv_vnode:repair(P) 
|| P <- Partitions].

Note: to quit the riak attach shell, use cntl-D, not cntl-C  (otherwise you will stop riak)

- Check status of the repair process
# riak-admin transfers

All in all, Riak recovers quite nicely, and it wasn't terribly difficult to find out what was going on.

On a side note, Basho does a great job if you have the benefit of using their support .



No comments: