Innodb – How to optimize SQL query which involves GROUP_CONCAT function with ORDER BY

group bygroup-concatenationinnodbmariadboptimization

I wrote the following SQL query to group results and then sort results in each group by date:

SELECT 
origin,
provider,
SUBSTRING_INDEX(GROUP_CONCAT(price
            ORDER BY departure_date
            SEPARATOR ','),
        ',',
        10)
FROM
    `flights`
WHERE
    origin = 'LHR' AND destination IN ('FCO', 'CDG', 'AMS')
        AND cabin_type = 'ECONOMY'
GROUP BY origin , provider;

Second query:

SELECT 
destination,
provider,
SUBSTRING_INDEX(GROUP_CONCAT(price
            ORDER BY departure_date
            SEPARATOR ','),
        ',',
        10)
FROM
    `flights`
WHERE
    origin = 'LHR' AND destination IN ('FCO', 'CDG', 'AMS')
        AND cabin_type = 'ECONOMY'
GROUP BY destination, provider;

InnoDB table flights is quite simple:

+----------------+---------------+------+-----+-----------+----------------+
| Field          | Type          | Null | Key | Default   | Extra          |
+----------------+---------------+------+-----+-----------+----------------+
| id             | int(11)       | NO   | PRI | NULL      | auto_increment |
| origin         | varchar(255)  | NO   | MUL | NULL      |                |
| destination    | varchar(255)  | NO   | MUL | NULL      |                |
| departure_date | date          | NO   | MUL | NULL      |                |
| price          | decimal(14,2) | NO   |     | NULL      |                |
| provider       | varchar(255)  | NO   |     | airnet    |                |
| cabin_type     | varchar(255)  | YES  |     | ECONOMY   |                |
| created_at     | datetime      | YES  |     | NULL      |                |
| updated_at     | datetime      | YES  |     | NULL      |                |
+----------------+---------------+------+-----+-----------+----------------+

I have indexes on columns: origin, destination, departure_date. And approx. 10 million records in the table.

Above-mentioned query takes about 200 to 600 milliseconds. I tried to use profiler and found out that most of the time is spent on Creating sort index (approx. 98% of whole query time).

Is there any way to optimize this query. I tried compund index on origin, provider, departure_date, but that did not seem to help. Maybe configuration should be tweaked more and if so which configuration variables must be considered?

Machine has 4 GB of RAM, 2 vCPUs and dedicated for DB only (10.3.13-Maria DB).

Best Answer

Test

WITH cte AS ( SELECT origin,
                     provider,
                     price,
                     departure_date,
                     ROW_NUMBER() OVER (PARTITION BY /* origin, */ provider
                                        ORDER BY departure_date) rn 
              FROM flights
              WHERE origin = 'LHR' 
                AND destination IN ('FCO', 'CDG', 'AMS')
                AND cabin_type = 'ECONOMY' )
SELECT origin,
       provider,
       GROUP_CONCAT(price
                    ORDER BY departure_date
                    SEPARATOR ',')
FROM cte
WHERE rn <= 10
GROUP BY origin, 
         provider;

UPDATE: (draft code, see comments)

WITH 
cte1 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date,
                 ROW_NUMBER() OVER (PARTITION BY provider
                                    ORDER BY departure_date) rn 
          FROM flights
          WHERE origin = 'LHR' 
            AND destination ='FCO'
            AND cabin_type = 'ECONOMY' ),
cte2 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date
          from cte1
          WHERE rn <= 10),
cte3 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date,
                 ROW_NUMBER() OVER (PARTITION BY provider
                                    ORDER BY departure_date) rn 
          FROM flights
          WHERE origin = 'LHR' 
            AND destination = 'CDG'
            AND cabin_type = 'ECONOMY' ),
cte4 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date
          from cte3
          WHERE rn <= 10),
cte5 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date,
                 ROW_NUMBER() OVER (PARTITION BY provider
                                    ORDER BY departure_date) rn 
          FROM flights
          WHERE origin = 'LHR' 
            AND destination = 'AMS'
            AND cabin_type = 'ECONOMY' ),
cte6 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date
          from cte5
          WHERE rn <= 10),
cte7 AS ( SELECT origin,
                 provider,
                 price,
                 departure_date
          FROM cte2
          UNION ALL
          SELECT origin,
                 provider,
                 price,
                 departure_date
          FROM cte4
          UNION ALL
          SELECT origin,
                 provider,
                 price,
                 departure_date
          FROM cte6 )
cte8 as ( SELECT origin,
                 provider,
                 price,
                 departure_date,
                 ROW_NUMBER() OVER (PARTITION BY provider
                                        ORDER BY departure_date) rn 
           FROM cte7)
SELECT origin,
       provider,
       GROUP_CONCAT(price
                    ORDER BY departure_date
                    SEPARATOR ',')
FROM cte8
WHERE rn <= 10
GROUP BY origin, 
         provider;