Pages

Monday, March 24, 2008

MySQL: Getting Creative with Partitioning

Lately, I've been trying to keep up with at least one of the MySQL Forums: Partitioning.

It's a great way to keep on top of issues surrounding partitioning, and also get an idea of what people are trying to do with the new 5.1 feature. Richard came up with an interesting problem that I jumped into only to realize that I hadn't done my homework, and my initial suggestion wouldn't work at all due to documented restrictions on partitioning in MySQL 5.1. :(

So, the question is, if I can't have a unique index on a partitioned table that does not contain all columns of the partition key, is there some other way to enforce that uniqueness?

We could use a trigger on the partitioned table that try's to insert / update into another non-partitioned table that contains that unique constraint:


CREATE TABLE part_table
(
id INTEGER NOT NULL,
other INTEGER NOT NULL,
PRIMARY KEY (id,other)
)
PARTITION BY KEY (id)
PARTITIONS 20;


So the restriction can be seen by trying to create a unique index now on column 'other' :

mysql> CREATE UNIQUE INDEX test ON part_table (other);
ERROR 1503 (HY000):
A UNIQUE INDEX must include all columns in the table's partitioning function
mysql>


Create a new table for constraining the index that is not partitioned:

CREATE TABLE unique_index_table
(
id INTEGER NOT NULL,
PRIMARY KEY (id)
) ;


Create a trigger on the partitioned table for enforcing the uniqueness:

DELIMITER /

CREATE TRIGGER enforce_unique
BEFORE INSERT ON
part_table
FOR EACH ROW BEGIN
INSERT INTO
unique_index_table SET id = NEW.id;
END;
/

DELIMITER ;


Now, enforcement of the unique index can be done on the partitioned table:

mysql>INSERT INTO part_table VALUES (1,1);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO part_table VALUES (1,2);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
mysql> INSERT INTO part_table VALUES (1,1);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
mysql> INSERT INTO part_table VALUES (2,1);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO part_table VALUES (3,3);
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO part_table VALUES (3,2);
ERROR 1062 (23000): Duplicate entry '3' for key 'PRIMARY'
mysql>


So, now it really becomes a question of performance. Functionally it works, but it it
really the best workaround?

I'll try some tests to compare the performance of this table partitioned with a trigger
enforced unique index against a non-partitioned table with a regular unique index
and see how the results are.

I'll post them as soon as I have the results. If anyone has a better way, I'm interested in knowing....

3 comments:

Mikael Ronstrom said...

Seems like a good way of doing a workaround for this limitation. It's very close to how a solution in the MySQL Server would also implement it. The MySQL Server would implement it by adding a separate table which is a global index table (remember that this table should most likely also be partitioned since partitioned tables are usually used for very large data sets) and the partition storage engine would execute the logical equivalent of this trigger.

johnf said...

This solution works great except if you are doing multiple inserts in one statement eg

INSERT INTO a(x) VALUES ('a'), ('b'),('c') ...);

If even just one of the them fails then the whole insert is rolled back.

Can anyone think of a work around? This is the only hiccup preventing us from moving to partitions. Doing single inserts would mean a 1000x increase in hits to the DB for us.

mvinturis said...

The workaround for the multiple inserts would be:

INSERT IGNORE INTO a(x) VALUES ('a'), ('b'),('c') ...);

The only problem is all the errors will be ignored, not only the duplicated data errors.