Pages

Friday, March 27, 2009

Lost and Found ?

Sometimes you just have to laugh at the crazy things that can kill a good evening.

I had this brilliant idea to change our replication setup on one of our Master-Master replication server setups this week. I got sick of having to restart MySQL every time we wanted to add a new database and have it included in the list of replicated databases - we were using replicate-do-db in our configs.

So it seems very straight forward to change to ignore-db or ignore-table (because of cross database updates).

After a few weeks in QA and a few weeks in staging - no problems, no issues, no complaints... let's go for it!


Yea, as soon as we deploy and restart MySQL to pickup the configs, replication fails and stops!

And of course to make it lots of fun, replication on a couple other servers failing at the same time for unrelated reasons, and then a migration of another application that night having issues...)

So look around a while, check the MySQL Monitor ... and, wala - lost+found - table does not exist errors!

Yea, ext3 rears it's ugly head again. I'm sure it's probably my responsibility to make sure that lost+found directories are clean up, etc, etc, but it sure made for a headache this week.

The fix (knock on wood) was straight forward - we just added ignore-table=lost%.%

Seemed to do the trick. Maybe we should check out ReiserFS or xfs or zfs.

Either way, if there's a way to break replication, I'm sure I'll find it... ;)


On that note - if you love the file system (linux based) that you are using for your MySQL servers, I'd love to hear your comments (good, bad, or ugly)

Saturday, March 14, 2009

What just happened to the database?

It's always fun when you come into work and notice that one of your database monitors/graphs has changed from showing almost no row accesses/sec to 40,000 rows/sec in a matter of minutes. And then by lunch time they are showing 90,000 rows/sec. What's up with that?

MySQL Enterprise Monitor Row Accesses:






Well, of course the first thing you do is check all your cacti monitors, because there is obviously something wrong with the monitoring system.

Cacti Innodb Row Accesses:






Hmm... it's not the monitor. Must be a change in the app...

So you check with everyone, and no, nothings changed in the app. In fact, nothing even appears to be different (same basic app response times, same # requests/sec to the app, etc, etc)

Database Activity:






So it must be a query cache issue...

Query Cache:






Apparently not... :(

So what could cause that kind of dramatic increase in index rows read / sec ?

In reality, looking at the data distribution, the optimizer apparently decided to 'explain' it all away....

Current explain plan:

mysql> \e
-> ;
+--------------+-------------------------------+---------+------+------+----------+
possible_keys
| key | key_len | ref | rows | filtered
| Extra
+--------------+-------------------------------+---------+------+------+----------+
UIDX_TOTAL_RATING_OWNER_ABOUT,IDX_TOTAL_RATING_ABOUT,IDX_TOTAL_RATING_OWNER
| UIDX_TOTAL_RATING_OWNER_ABOUT | 180 | NULL | 5 | 80.00
| Using where; Using index; Using filesort |
+--------------+-------------------------------+---------+------+------+----------+
1 row in set, 1 warning (0.00 sec)



Previous explain plan:

mysql> \e
-> ;
+--------------+-------------------------------+---------+------+------+----------+
possible_keys
| key | key_len | ref | rows | filtered
| Extra
+--------------+-------------------------------+---------+------+------+----------+
UIDX_TOTAL_RATING_OWNER_ABOUT,IDX_TOTAL_RATING_ABOUT,IDX_TOTAL_RATING_OWNER
|IDX_TOTAL_RATING_OWNER | 42 | const,const | 1 | 100.00
| Using where |
+--------------+-------------------------------+---------+------+------+----------+
1 row in set, 1 warning (0.03 sec)

Interesting what a little change in data and a different path through the optimizer can do...