Pages

Wednesday, October 01, 2008

MySQL Data Woes... or, Making Use of the Information_Schema

To be fair - I highly doubt it has anything to do with MySQL.

Database corruption - that's a different issue, often related to a bug in database code; but this was a data corruption issue, which is always a pain in the you know what to figure out and get fixed.

I spent the last day trying to figure out the weirdness in a couple of our databases, digging around, explaining to developers that there's no way it could be an issue with the database - it must be in their caching layer somewhere.

It never really occurred to me that the issue might be the data itself... duh. Especially when it turned out to be an issue with a few fields in a few rows with a little extra utf8 character in front of the actual data that happened to map to an ASCII zero (end of line char, I think). Nice.

Now we have to go check every stinkin database, every table, every char based column and look for that character.... Ugh.

Oh yea - we have an information_schema now in 5.1.x ... :)

So about an hour later after a quick code review with the developers and some validation tests, we had a cool little data issue checker to quickly run everywhere, that uses the information_schema database. I'm sure there's a slicker way to do it (and who knows, there might even be one up on MySQL Forge already). I suppose I should have checked there first, but I then again, it's always fun to re-invent the wheel...

Here's what we came up with:


DELIMITER /

DROP PROCEDURE IF EXISTS check_invalid_data /

CREATE PROCEDURE check_invalid_data(db_to_check varchar(128)) -- pass in db name
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE exec_str TEXT;
DECLARE sql_str TEXT;
DECLARE exec_cur CURSOR FOR
SELECT check_string FROM validate_data;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

DROP TEMPORARY TABLE IF EXISTS validate_data;

CREATE TEMPORARY TABLE validate_data
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
check_string TEXT
);

DROP TEMPORARY TABLE IF EXISTS validate_results;

CREATE TEMPORARY TABLE validate_results
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
table_info VARCHAR (256),
column_info VARCHAR (256),
column_length INT
);

INSERT INTO validate_data (check_string)
SELECT CONCAT ('INSERT INTO validate_results (table_info,column_info, column_length) SELECT ''',table_name,''',''',
column_name, ''',LENGTH(',column_name,') FROM ', db_to_check ,
'.' , table_name, ' WHERE ASCII (', column_name ,' ) = 0 AND LENGTH (', column_name ,') > 0 ;')
FROM information_schema.columns
WHERE table_schema = db_to_check
AND data_type IN ('char','varchar','text','mediumtext','tinytext','long','long varchar','enum','set');

OPEN exec_cur;

REPEAT
FETCH exec_cur INTO sql_str;
set @exec_str = sql_str;
IF NOT done THEN
PREPARE validate_stmt FROM @exec_str;
EXECUTE validate_stmt;
DEALLOCATE PREPARE validate_stmt;
END IF;
UNTIL done END REPEAT;

CLOSE exec_cur;

SELECT * FROM validate_results;
END
/
DELIMITER ;

2 comments:

rpbouman said...

Hi!

it seems to me you need to use CHAR_LENGTH() not LENGTH() for the char, varchar, text, mediumtext, tinytext type.

Also, your REPEAT loop to walk the cursor is suboptimal. You are essentially checking twice each iteration if the loop is done (IF NOT done THEN and UNTIL done). Much nicer to use a simple LOOP...END LOOP

Phil Hildebrand said...

Good point Roland... I suppose that's why code reviews exist, huh?

Thanks