Mysql – Pagination, check number of selected records

cacheinnodbMySQLpaging

I am working on pagination for a complex join query and unfortunately I am not sure what's the best way to go… Right now I am thinking about the following solution.

Imagine that SELECT id... is the complex join query. I copy all id's to the temporary cache.

CREATE TEMPORARY TABLE tmp SELECT id FROM comments WHERE user_id = 123;

Now I will check number of selected records.

SELECT COUNT(*) FROM tmp;

If a number of records is let's say greater than 1000 all records will be moved to a cache table and then selected by e.g 100's (completely all records from a cache are going to be selected ,so it is not as a classic pagination…).

INSERT INTO cache_table SELECT * FROM tmp;
DROP TEMPORARY TABLE tmp; #Temp is not needed anymore

however if number of records is smaller than 1000 then they don't have to be cached and can be immediately selected:

SELECT column_names FROM comments c JOIN tmp t ON c.id = t.id;
DROP TEMPORARY TABLE tmp; #Temp is not needed anymore

OK, now if there are some records in cache, they will be selected by 100 in this case. Records don't have to be selected all at once (10*100), a connection may close and client will ask data later. So we have got 1000 records in cache. To select a data I execute:

SELECT column_names FROM comments c JOIN cache_table ct ON c.id = ct.id LIMIT 100;
DELETE FROM cache_table LIMIT 100; #Records are selected so they can be deleted

This process is repeated until it return 0 or less than 100 records (If there is e.g 1001 rec).

Best Answer

Why generate a temporary that drops on disconnect ? You would have to reload the table of ids, which requires extra code to manage the emptying and reloading. If you create a subquery that limits to 100 ids at a time, you could join to the cache_table:

Just create an inline query for the first 1000 rows, then join

SELECT ct.column_names FROM
(SELECT id FROM comments WHERE user_id = 123 LIMIT 1000) c
LEFT JOIN cache_table ct ON c.id = ct.id;

If you have to paginate, adjust in the inline query to paginate on the IDs then join last

SELECT ct.column_names FROM
(SELECT id FROM comments WHERE user_id = 123 LIMIT 0,100) c
LEFT JOIN cache_table ct ON c.id = ct.id;

SIDE NOTE: I highly advise against creating temporary InnoDB tables. They are much harder to cleanup after a crash because of leftover entries in the ibdata1 file (home of the data dictionary). MyISAM temporary tables are easy to cleanup just by simply deleting the .MYD and .MYI files in the /tmp folder.

UPDATE 2014-12-22 18:19 EST

It sounds like you want Dynamic SQL based on the row count of tmp

CREATE TEMPORARY TABLE IF NOT EXISTS tmp ENGINE=MyISAM
SELECT id FROM comments WHERE user_id = 123;
SELECT COUNT(1) INTO @reccount FROM tmp;

SET @pg_origin = ???;
SET @pg_offset = 100;

SET @q1 = 'SELECT column_names FROM comments c JOIN tmp t ON c.id = t.id';
SET @q2 = CONCAT('SELECT column_names FROM comments c JOIN cache_table ct ON c.id = ct.id LIMIT ',@pg_offset,',',@pg_origin,';');
SET @sql = IF(@reccount < 1000,@q1,@q2);

PREPARE s FROm @sql; EXECUTE s; DEALLOCATE PREPARE s;

I still strongly recommend using MyISAM for the temp table since the row count is written in the .MYD header of the file (O(1) execution time to get the count) whereas an InnoDB temp table required physically counting the rows (O(n) execution time to get the count).

As for the pagination, all you would need is to set @pg_origin to set where to begin retrieval.

As for InnoDB temp tables, sometimes they do not properly drop on disconnect and leave pigeon holes in ibdata1. I wrote about this problem about 2.5 years ago

UPDATE 2014-12-23 13:08 EST

Your last comment

Can you think about some other suggestions except UPDATE 2014-12-22 18:19 EST? Is it possible to somehow eliminate the use of a temporary table?

If you look at the UPDATE 2014-12-22 18:19 EST, you'll notice that it only creates and loads the table once because I use CREATE TEMPORARY TABLE IF NOT EXISTS tmp instead of CREATE TEMPORARY TABLE tmp.

If you know user_id 123 has new comments you would drop, create and load the temp table.

Even if you don't want the temp table at all, you need to get the count anyway. So, go get count, move it around in a session variable from page to page. Then, apply the Dynamic SQL against that session count rather than having the table.