You were so helpful with my last question, I thought id throw another one at you thats stumping me.
Basically I got a table, requirements from company was it was all supposed to be 1 table so I got myself 1000 columns, and 100million rows (… life sucks).
Anyway they want me to generate for them 500,000 row random sample of their data. Problem is my PK is a VARCHAR(50) of their ids and has no increment system to it.
My rig is not overly powerful so a ORDER BY RAND() would be very ill advised!
Currently I am working on a Stored Procedure to maybe make a temp table then export it to csv?
Its not currently done, but if I am going the wrong direction let me know!
CREATE PROCEDURE randomResults()
BEGIN
DECLARE results CURSOR SCROLL FOR
FOR
SELECT * FROM massiveTable;
DECLARE nth INTEGER;
SET nth = 0;
DECLARE hits INTEGER;
SET hits = 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
CREATE TABLE tmp LIKE massiveTable;
-- open
OPEN results;
-- fetch
WHILE hits <= 500000 DO
FETCH NEXT FROM results;
IF (nth mod 5) THEN
INSERT INTO
SET hits = hits + 1;
END IF;
SET nth = nth + 1;
END WHILE;
-- end fetch
-- close
CLOSE results;
END$$
DELIMITER ;
I am sure you guys have a better way of doing this, was just planning to get every 5th record and put it into a tmp table then export it…
My Table:
TABLE massiveTable (
staffhashID VARCHAR(50) PRIMARY KEY,
email VARCHAR (100),
marketCat1 tinyINT(1),
...
1000 of them
) engine = MYISAM;
an example of the PK:
433kfdgsd3gfsd232
Best Answer
This may sound rather evil but you might find this interesting
Suppose your table looked like this
Step 01 : Create a Empty Copy of the Table and another for Keys
Step 02 : Iteratively load every fifth key
Step 03 : Create the Random Temp Table
Step 04 : There is no Step 04. That's it !!! No stored procedure needed.
CAVEAT : I cannot make any promises on the running time or query processing.
Give it a Try !!!