Mysql – Getting Random Sample from large database MYSQL (No Auto Inc Field)

myisamMySQLmysql-5.5stored-procedures

You were so helpful with my last question, I thought id throw another one at you thats stumping me.

Basically I got a table, requirements from company was it was all supposed to be 1 table so I got myself 1000 columns, and 100million rows (… life sucks).

Anyway they want me to generate for them 500,000 row random sample of their data. Problem is my PK is a VARCHAR(50) of their ids and has no increment system to it.

My rig is not overly powerful so a ORDER BY RAND() would be very ill advised!

Currently I am working on a Stored Procedure to maybe make a temp table then export it to csv?

Its not currently done, but if I am going the wrong direction let me know!

CREATE PROCEDURE randomResults()
BEGIN
    DECLARE results CURSOR SCROLL FOR
    FOR
    SELECT * FROM massiveTable;
    DECLARE nth INTEGER;
    SET nth = 0;
    DECLARE hits INTEGER;
    SET hits = 0;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    CREATE TABLE tmp LIKE massiveTable;  
    -- open
    OPEN results;
    -- fetch

    WHILE hits <= 500000 DO
        FETCH NEXT FROM results;
        IF (nth mod 5) THEN
            INSERT INTO 
            SET hits = hits + 1;        
        END IF;
        SET nth = nth + 1;
    END WHILE;
    -- end fetch    
    -- close
    CLOSE results;
END$$
DELIMITER ;

I am sure you guys have a better way of doing this, was just planning to get every 5th record and put it into a tmp table then export it…

My Table:

TABLE massiveTable (

staffhashID VARCHAR(50) PRIMARY KEY,
email VARCHAR (100),
marketCat1 tinyINT(1),
...
1000 of them
) engine = MYISAM;

an example of the PK:
433kfdgsd3gfsd232

Best Answer

This may sound rather evil but you might find this interesting

Suppose your table looked like this

CREATE TABLE massiveTable
(
    PK VARCHAR(50) NOT NULL,
    ... Other Fields ....
    PRIMARY KEY (PK)
) ENGINE=MyISAM;

Step 01 : Create a Empty Copy of the Table and another for Keys

CREATE TABLE randomKeys SELECT PK FROM massiveTable WHERE 1=2;
CREATE TABLE randomTable LIKE massiveTable;

Step 02 : Iteratively load every fifth key

SET @x = 0;
INSERT INTO randomKeys
SELECT PK FROM
(SELECT PK,(@x:=@x+1) num FROM massiveTable) A
WHERE MOD(num,5)=0;

Step 03 : Create the Random Temp Table

ALTER TABLE randomTable DISABLE KEYS;
INSERT INTO randomTable
SELECT B.* FROM randomKeys A LEFT JOIN randomTable B USING (PK);
ALTER TABLE randomTable ENABLE KEYS;

Step 04 : There is no Step 04. That's it !!! No stored procedure needed.

CAVEAT : I cannot make any promises on the running time or query processing.

Give it a Try !!!