I have a table that contains several keys into other tables (where each key is comprised of multiple columns). I would like to create a new column for each key that would be an integer such that value i
represents the ith
occurrence of that key (but I want that ordering to be random). I thought perhaps some sort of auto increment would work, but I have multiple keys and thus would need multiple columns (I believe only one auto increment is permitted).
The solution I came up with doesn't seem to be working. First I created a new table to store the columns I am interested in, plus one extra column, rnd
. I would use rnd
later as a way to to ORDER BY rand()
in a context where I need an actual column and not a function. Here's how I accomplished the first bit:
SET @rnd = 0;
INSERT INTO new_table SELECT col1, ..., colN, @rnd := @rnd+1 FROM original_table ORDER BY rand();
Next I would add an auto increment based upon the key, and I would make use of the rnd
column I just created. The ALTER TABLE...ORDER BY
does not permit ordering by a function (like rand()
), so that's why I needed to create the rnd
column first:
ALTER TABLE new_table ADD first_index INT UNSIGNED NOT NULL AUTO_INCREMENT, ADD KEY(col1, col2, first_index), ORDER BY rnd;
I would then remove the auto increment status and repeat for each key:
ALTER TABLE new_table MODIFY COLUMN first_index INT UNSIGNED NOT NULL;
At first this seemed to do the trick. However, upon closer inspection, it appears as though the random ordering was not respected. The index columns appeared to be created in the same order that the original table happened to use. I'm rather frustrated as I went through several very awkward steps to ensure a random ordering, and that did not even work. Why didn't my solution work, and is there a more elegant way to do this?
EDIT: It was apparently not clear what I was looking for, so I'll try a simple example:
Col1 Col2 <other columns>
1997 A
1997 B
2001 B
1997 A
2001 B
1997 A
2001 A
1997 B
The above ordering reflects the original ordering which is neither truly predictable nor truly random. After adding my rnd
column and sorting by it I would have something like this:
Col1 Col2 <other columns> rnd
1997 B 1
2001 A 2
1997 B 3
2001 B 4
1997 A 5
1997 A 6
1997 A 7
2001 B 8
I would then get the first index like this:
Col1 Col2 <other columns> rnd first_index
1997 B 1 1
2001 A 2 1
1997 B 3 2
2001 B 4 1
1997 A 5 1
1997 A 6 2
1997 A 7 3
2001 B 8 2
So 1997 with B appeared 2 times so those values are numbered from 1-2, whereas 1997 with A appeared 3 times so those values rows are numbered 1-3. I used the ORDER BY rnd
so that it was random as to which row of 1997-A got assigned index 1, which one got 2, and which one got 3 (but it would be deterministic after that initial assignment of course).
EDIT: I'll add another example since my goal seems unclear. The MySQL docs show a method to get almost there:
CREATE TABLE animals (
grp ENUM('fish','mammal','bird') NOT NULL,
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name CHAR(30) NOT NULL,
PRIMARY KEY (grp,id)
) ENGINE=MyISAM;
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id;
This creates a table which, although not what I want, gets close:
+--------+----+---------+
| grp | id | name |
+--------+----+---------+
| fish | 1 | lax |
| mammal | 1 | dog |
| mammal | 2 | cat |
| mammal | 3 | whale |
| bird | 1 | penguin |
| bird | 2 | ostrich |
+--------+----+---------+
This close, but I think there are at least two reasons why this doesn't work for me. First of all, I need to do it on an existing table, not on one that I'm creating. Secondly I don't want dog
to show up first in the mammal
group just because it was first in the underlying table. I want to, one time, randomize that order. From then on the ordering will be deterministic, but I want it to be random first.
If you are wondering about my motivation for wanting to do this, here is a question explaining what I'm trying to do with this. Also note that the solution should be in MySQL.
Best Answer
Here's what I would do. I hope I did understand your question correctly. It's a workaround and perhaps not very elegant: