Mysql – how to cache a subset for cascading select queries in thesql

myisamMySQLselect

Here is another database problem I stumbled upon.

I have a date-range partitioned MyIsam look-up table with 200M records and ~150 columns. On this Table I need to perform cascading SELECT-Statements to filter the data. Output:

filter 126M    
filter 110M    
filter 40M    
filter 5M    
filter 100k

Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.

There are about 500 unique filters and around 200 constant users. Every filter needs to be run for each user, in total around 100k combinations.

Big question: Is there a way for each subsequent SELECT statement to query only the previous subset?

Example: Filter #5 should only have to query the 5M rows out of query 4 to get those 100k results. At the moment it has to scan through all 200M records.

EDIT
current approach:
cache table

CREATE TABLE IF NOT EXISTS `cache` (
  `filter_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `lookup_id` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE `cache`
 ADD PRIMARY KEY (`filter_id`,`user_id`);

This would contain the relation between individual data-rows from the lookup table and the filters. PLUS I'd be able to use the rrimary index to get all of the lookup_ids from the previous filter.

Query for subsequent filters:

SELECT SUM( column), COUNT(*)
    FROM cache c
    LEFT JOIN lookup_table l ON c.lookup_id= l.id
    WHERE
    c.filter_id = 1
    AND c. user_id= x
    AND l.regex_column = preg_rlike...

Best Answer

If filter 110M is exactly a subset of filter 126M, then appending more AND with WHERE would have done the job.

$sql1 = "SELECT ..... WHERE ...";
$sql2 =  $sql1 . "AND column-name  = ....";
$sql3 =  $sql2 . "AND column-name  = ....";

If that is complex to accomplish try to CREATE VIEW of the previous SELECT statement and the next SELECT statement should query from the view.