MySQL – How to Optimize Slow CREATE TEMPORARY TABLE from SELECT

MySQLoptimizationtemporary-tables

I asked this on StackOverflow, but there are not many hints. So:

SHOW CREATE TABLE cashadv_txn:

CREATE TABLE `cashadv_txn` (
  `txn_id` int(11) NOT NULL AUTO_INCREMENT,
  `cashadv_id` int(11) NOT NULL,
  `txn_type` enum('LI','MI','CI','LD','MD','LDA','SRDA','SRD','NRF') DEFAULT NULL,
  `datetime` datetime NOT NULL,
  `amount` float(20,2) NOT NULL,
  `status` varchar(16) DEFAULT NULL,
  `usaepay_invoice_id` int(11) DEFAULT NULL,
  `link_uid` int(11) DEFAULT NULL,
  `link_date` date DEFAULT NULL,
  `link_txn_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`txn_id`),
  KEY `link_txn_id` (`link_txn_id`),
  KEY `type_date_uid` (`cashadv_id`,`txn_type`,`link_date`,`link_uid`),
  KEY `usaepay_invoice_id` (`usaepay_invoice_id`),
  KEY `idx1` (`cashadv_id`,`link_uid`)
) ENGINE=InnoDB AUTO_INCREMENT=4586015 DEFAULT CHARSET=utf8

relevant temporary table variables. I know this is an overkill, but I can reduce it later. This is just to show that they are high enough:

SHOW GLOBAL VARIABLES LIKE 'max_heap_table_size';

enter image description here

SHOW GLOBAL VARIABLES LIKE 'tmp_table_size' ;

enter image description here

This query takes around 0.8-1 seconds:

SELECT SQL_NO_CACHE
            link_uid AS lender_uid,
            cashadv_id,
            SUM(amount) AS lenderTotalCredit      
         FROM
            cashadv_txn      
         WHERE
            cashadv_txn.txn_type='LDA'      
         GROUP BY
            cashadv_id,
            link_uid;

However, this query (which is in fact the previous one saving results into a temporary table) takes 40 seconds:

CREATE TEMPORARY TABLE IF NOT EXISTS tLenderTotalCredits ENGINE=MEMORY  AS (SELECT
            link_uid AS lender_uid,
            cashadv_id,
            SUM(amount) AS lenderTotalCredit      
         FROM
            cashadv_txn      
         WHERE
            cashadv_txn.txn_type='LDA'      
         GROUP BY
            cashadv_id,
            link_uid
            );

SELECT @@version : 5.6.22-1+deb.sury.org~precise+1

UPDATE: EXPLAIN of the SELECT statement:

enter image description here

You can see all the indexes above in SHOW CREATE TABLE

UPDATE: It was my mistake in the measurements, actually. I'm using SQLyog for query debugging and it limits by default by 1000 records. I'm aware of it and tested it with limit rows option unchecked. The problem is: on the first attempt, once I uncheck the option it still says it takes about 2 seconds (in reality it takes good 40 s):

enter image description here

However, if i restart the query with the option already unchecked, then it reports the time right. My mistake was that I thought the time on option uncheck was right and the rest ~30 s it was just rendering all the rows in a grid.

As another question, I'm trying to optimize this query (which is crazy, I know) and which is in turn a subquery in a bigger one. But I think if I optimize this, the whole bigger query will run really faster. At the moment this query takes 1.5 min to execute:

SELECT
         cashadv_id,
         link_date AS DATE,
         link_uid AS lender_uid,
         amount,
         'Approved' AS STATUS   
      FROM
         `cashadv_txn`   
      WHERE
         txn_type='LD'     
         AND STATUS='Approved'   
      UNION
      ALL SELECT
         tLenderTotalCredits.cashadv_id,
         DATE(NOW()) AS DATE,
         tLenderTotalCredits.lender_uid,
         IFNULL(tLenderTotalCredits.lenderTotalCredit,0) - IFNULL(tLenderTotalPay.lenderTotalPay,0) AS amount,
         'Ready' AS STATUS   
      FROM
         ( SELECT
            link_uid AS lender_uid,
            cashadv_id,
            SUM(amount) AS lenderTotalCredit      
         FROM
            cashadv_txn      
         WHERE
            cashadv_txn.txn_type='LDA'      
         GROUP BY
            link_uid,
            cashadv_id ) tLenderTotalCredits   
      LEFT JOIN
         (
            SELECT
               cashadv_id,
               link_uid AS lender_uid,
               SUM(amount) AS lenderTotalPay      
            FROM
               cashadv_txn      
            WHERE
               txn_type='LD'        
               AND STATUS='Approved'      
            GROUP BY
               cashadv_id,
               link_uid 
         ) tLenderTotalPay 
            ON tLenderTotalPay.cashadv_id=tLenderTotalCredits.cashadv_id   
            AND tLenderTotalPay.lender_uid=tLenderTotalCredits.lender_uid -- check if there is current day payment     
      LEFT JOIN
         cashadv_txn cashadv_txn3 
            ON cashadv_txn3.txn_type='LD'   
            AND cashadv_txn3.cashadv_id=tLenderTotalCredits.cashadv_id   
            AND cashadv_txn3.link_uid=tLenderTotalCredits.lender_uid   
            AND cashadv_txn3.link_date = DATE(NOW())   
            AND cashadv_txn3.status='Approved'   
      WHERE
         cashadv_txn3.cashadv_id IS NULL

As you see, there are two subquery joins that I was hoping to speed up by first getting their resultsets into an in-memory temp table and then join on them.

I have to say, I'm not yet 100% faimilar with the app logic, so I'm afraid of breaking something, I don't know if we really need all 23k rows in these two sub-joins or we can limit them. But apart of that, do you have any ideas on how to optimize this? The cashadv_txn structure is above.

Thanks

Best Answer

How much RAM do you have? Do you have 2GB to spare? That is what you are allowing for that MEMORY table.

Remove the parentheses CREATE ... AS ( SELECT ... ).

Add

INDEX(txn_type, cashadv_id, link_uid) -- in that order
INDEX(txn_type, status, cashadv_id, link_uid) -- in that order

It is dangerous to have tmp_table_size = 2G. If multiple connections each needed a tmp table, you could quickly run out of RAM. At that point, mysql gets veeery slow.

This construct optimizes poorly, hence runs slowly:

FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...

DATE(NOW()) --> CURRENT_DATE()

float(20,2) --> FLOAT or DECIMAL(20,2) (probably 20 is excessive)

Consider changing status varchar(16) DEFAULT NULL, to an ENUM.