Pages

Friday, March 27, 2009

Lost and Found ?

Sometimes you just have to laugh at the crazy things that can kill a good evening.

I had this brilliant idea to change our replication setup on one of our Master-Master replication server setups this week. I got sick of having to restart MySQL every time we wanted to add a new database and have it included in the list of replicated databases - we were using replicate-do-db in our configs.

So it seems very straight forward to change to ignore-db or ignore-table (because of cross database updates).

After a few weeks in QA and a few weeks in staging - no problems, no issues, no complaints... let's go for it!


Yea, as soon as we deploy and restart MySQL to pickup the configs, replication fails and stops!

And of course to make it lots of fun, replication on a couple other servers failing at the same time for unrelated reasons, and then a migration of another application that night having issues...)

So look around a while, check the MySQL Monitor ... and, wala - lost+found - table does not exist errors!

Yea, ext3 rears it's ugly head again. I'm sure it's probably my responsibility to make sure that lost+found directories are clean up, etc, etc, but it sure made for a headache this week.

The fix (knock on wood) was straight forward - we just added ignore-table=lost%.%

Seemed to do the trick. Maybe we should check out ReiserFS or xfs or zfs.

Either way, if there's a way to break replication, I'm sure I'll find it... ;)


On that note - if you love the file system (linux based) that you are using for your MySQL servers, I'd love to hear your comments (good, bad, or ugly)

4 comments:

Anonymous said...

I encountered this effect of MySQL's implementation of database schemas being correlated to subdirectories of the $DATADIR when testing a new replicant (as an LVM backup target) a couple of years ago. This behavior is described in the manual and in at least one closed bug.

What you should really change is the location of your $DATADIR to a directory below its current location at the ext3 mount point, i.e.,
/mountpoint/db/
rather than
/mountpoint/
as a uniform matter of policy before including a new host in the replication chain.

I think that's preferably cleaner than adding to the configuration file.

Anonymous said...

The other option is to not run your mysql data directory as the root dir on a parition. I usually have my data partition as /data/mysql and inside of that I have mysql's data as /data/mysql/mysql-data.

ext3 is a fine filesystem, I personally would not ditch it just because of the lost+found directory.

Phil Hildebrand said...

Yea - that makes sense - and it's fairly straight forward (and yes, much better than my hack workaround).

We normally mount to /data . Mounting to /data/mysql or whatever would be a very simple solution.

I suppose that's going to mean one more mysql restart though ;)

Baron said...

Phil,

I normally mount /dev/something to /data, then create /data/mysql, then symlink /var/lib/mysql to /data/mysql.

There are several benefits. One -- no lost+found. Two -- if the device fails to mount for some reason, you get a broken symlink, instead of mysql starting and thinking its data directory exists but is empty. Or worse, someone who doesn't realize that the device isn't mounted thinks "hey, where did the data go" and re-initializes mysql with new data in /data.

I'm not sure I'm communicating it clearly here, but avoiding putting mysql's data in the root of a filesystem has saved a ton of confusion in the situations I've been involved in.