Thursday, July 11, 2013

By all means, learn from my mistakes as a DBA!

Here are 3 recent ' oops... wish I hadn't done that :/ ' mistakes I've made since joining moz that you might as well avoid (I'm sure there will be more, but they better not be the same)

Reviewing config files for MySQL, but not all of the defaults 

    We recently migrated a few MySQL databases to a new datacenter, and took advantage of the migration to upgrade the MySQL version(s) at the same time.  Of course I made sure to have another DBA review all my new config changes that were deprecated or new that we wanted take advantage of.  After things were migrated and the new applications were tested, things seemed fine - just a few bugs here and there, primarily around character set issues ( which, obviously, would be a dev problem, right? ).

Oops...  the default character set for MySQL is not utf8 !

     Lesson Learned:  

Don't just check that the configs have the correct settings; check for settings that are not there (defaults), as they are the ones that may be great for most, but perhaps not so great for you.

Restricting the # connections allowed for a user

    In order to introduce a quick 'throttling' mechanism for a job based system with a MySQL backend we implemented user resource restrictions.  It worked like a charm in staging, so we rolled it out to prod; as soon as the job connections exceeded 30 they were disconnected and had to wait till there other jobs completed, and all the devs had to do was change the user name for the job connection url.  Sweet!

   Later that day we started getting host denied errors...

Oops...  max_connection_errors matters !

     Lesson Learned:  

Make sure to take time to understand secondary effects of changes over time, as often issues don't show up immediately   That's why 'soak' or 'bake time' is not something that should be overlooked when implementing changes.

Upgrading percona toolkit and with Capistrano

    Awesome!  I got my Capistrano scripts working great - it's time to upgrade our percona toolkit on all our servers, so let's start with all our slave instances first ( it is production... ).  What's cool it with cap I can deploy it to all the slaves at once.

   What's not so awesome is I can also upgrade and restart mysql on all the slaves at once too :/

Oops...   'apt-get -y upgrade' percona-toolkit still upgrades everything, including MySQL !

     Lesson Learned:  

Make sure staging / test systems exactly mimic production when verifying deployment changes, and if you're working on a different distribution of Linux or MySQL than you are used to make sure to read the man pages!  ( apt-get install -y percona-toolkit is what I wanted, and what the man page said to do )

Yes, even those of us that have been around the block a few times still make mistakes ....