Mysql – Perform group by with random group expression (randomly partitioning rows) MYSQL

MySQLrandom

Let's say I have a table (with millions of rows)

cat1|cat2|value
A   |   B|   10 
...

and I want to group the data by cat1 and also split the grouping into 10 (at least roughly) equal partitions of the data.

The purpose of the random grouping is for the machine learning technique of cross-validation. (It's similar to splitting subjects into a control group and experimental group.)

What I would imagine to use is the following:

select cat1, r, sum(value)
from t 
group by cat1, floor(rand()*10) as r

which doesn't work.

One way which I believe would work would be simply to insert a new column with rand into my table and then group appropriately on that.

Is there any way of doing this grouping otherwise?

Best Answer

Your post piqued my curiosity and I look at the RAND() function - which you can't perform GROUP BYs with. I found this and if your table has a PRIMARY KEY, there is a MySQL pseudo-column (_rowid), much like is found in Oracle, Firebird (and maybe others) then you can issue queries such as this one (see structure and data for comments table (sample) below).

select MOD(_rowid, 3) AS my_field, COUNT(ticket_id) 
FROM comments 
GROUP BY my_field;

+----------+------------------+
| my_field | count(ticket_id) |
+----------+------------------+
|        0 |                6 |
|        1 |                7 |
|        2 |                7 |
+----------+------------------+

Now, if your PK is some sort of AUTO INCREMENT and "random" with respect to your data, you could perhaps use the _rowid pseudo-column (with MOD) to generate a sufficient degree of randomness for your requirements, and perform your aggregate queries?

---- structure and data in comments table---

CREATE TABLE `comments` 
(
  `comment_id` int(11) NOT NULL,
  `ticket_id` int(11) NOT NULL,
  PRIMARY KEY (`comment_id`)
);


mysql> SELECT * FROM comments;
+------------+-----------+
| comment_id | ticket_id |
+------------+-----------+
|          1 |         1 |
|          2 |         1 |
|          3 |         1 |
|          4 |         1 |
|          5 |         2 |
|          6 |         2 |
|          7 |         2 |
|          8 |         2 |
|          9 |         3 |
|         10 |         3 |
|         11 |         3 |
|         12 |         3 |
|         13 |         4 |
|         14 |         4 |
|         15 |         4 |
|         16 |         4 |
|         17 |         5 |
|         18 |         5 |
|         19 |         5 |
|         20 |         5 |
+------------+-----------+