MySQL – Fix Extremely Long Query Time with Nested WHEN-INs

join;MySQLnormalizationperformance

For the first time, I am running up against a horrifically long execution time of a MySQL query (~5 minutes).

The data in the database is highly (and not arbitrarily) normalized. It very efficient at organizing and shuffling data to be displayed in a lot of very helpful ways for different purposes, except this one particular query is throwing a wrench into it.

I can't understand the reason for it. However, some background information that may shed some light onto otherwise arbitrarily convoluted queries.


The company has divided the world up into many many teams (macroregions). Everybody belongs to one or two teams, based on their expertise.

  • For example, there are many diverse teams. A few examples are Spanish, Sahara, Iberia, Portuguese, Jungle teams. Each of the teams has considerable overlap with other teams, but is in some senses independent.

  • The Arabic team works quite closely with the Sahara team, by virtue of the fact that the database tells them they have to work together on certain assignments, because of overlapping geographic locations. The Spanish and the Portuguese team also work closely together, they both work with the Americas and Europe teams and the Portuguese team also works with the Africa team, as does the Arabic team.

  • Each team has a given set of regions, which are also not unique to that particular team. For example, the Mediterranean region belongs to around 12 teams, and when an event occurs there, they all work on it together.

  • Each country belongs to one or more regions. Turkey belongs to Central Asia, Europe and even Mediterranean, and a few others.

Given all that, it is necessary to show each individual what other people on their teams are doing, as well as people who are not on their team, but have overlapping regions.

Query 1 accomplishes this perfectly, and very quickly less than .09 seconds.


                SELECT report_name 
                FROM reports 
                WHERE region IN (
                    SELECT distinct region 
                    FROM macroregions
                    WHERE macroregion IN (
                        SELECT distinct macroregion
                        FROM users 
                        WHERE callsign = '$thisuser'
                    )
                ) 

It is also important people can see every occurrence of the report_name, when it Query 1 deems that the individual logged in should know about it. Reports_names are automatically generated and reused for each geographic location

  • If I am on the East Asia team and someone in my area is working with the Spanish team, there would be two entries in the database:

    • 20120210JOMX01 Japan Okinawa

    • 20120210JOMX01 Mexico Nuevo Leon

Query 2 takes Query 1 and wraps one layer of WHEN-INs around it. It allows the East Asia team members know that someone on their team is working in Mexico. But the performance delay is unacceptable and totally unusable; it takes nearly 5 minutes to complete a single query!


        SELECT * 
        FROM reports
        WHERE report_name IN (
            SELECT report_name 
            FROM reports 
            WHERE region IN (
                SELECT distinct region 
                FROM macroregions
                WHERE macroregion IN (
                    SELECT distinct macroregion
                    FROM users 
                    WHERE callsign = '$thisuser'
                )
            )    
        )

Despite the fact that the query works, it takes too long to be effective. Again, it is startling how long this particular query takes. Whereas nothing else in the interface has given me any sort of performance issue (i.e. they all take less than one second).

What steps can I take to fix this?

Best Answer

Let's start with your original query

   SELECT *  
    FROM reports 
    WHERE report_name IN ( 
        SELECT report_name  
        FROM reports  
        WHERE region IN ( 
            SELECT distinct region  
            FROM macroregions 
            WHERE macroregion IN ( 
                SELECT distinct macroregion 
                FROM users  
                WHERE callsign = '$thisuser' 
            ) 
        )     
    ) 

You can gather keys only (in stages), then join the keys with the reports table. Here is my new proposed query

SELECT reports.* FROM
(
    SELECT rpts.report_name FROM
    (
        SELECT DISTINCT regions.region FROM
        (SELECT DISTINCT macroregion
        FROM users WHERE callsign = '$thisuser') users
        INNER JOIN
        (SELECT DISTINCT macroregion,region FROM regions) regions
        USING (macroregion)
    ) regionkeys
    INNER JOIN
    (SELECT region,report_name FROM reports) rpts
    USING (region)
) reportnamekeys
INNER JOIN reports
USING (report_name);

If you are not comfortable with the USING clause here is my new propsed query without using the USING clause:

SELECT reports.* FROM
(
    SELECT rpts.report_name FROM
    (
        SELECT DISTINCT regions.region FROM
        (SELECT DISTINCT macroregion
        FROM users WHERE callsign = '$thisuser') users
        INNER JOIN
        (SELECT DISTINCT macroregion,region FROM regions) regions
        ON users.macroregion = regions.macroregion
    ) regionkeys
    INNER JOIN
    (SELECT region,report_name FROM reports) rpts
    ON regionkeys.region = rpts.region
) reportnamekeys
INNER JOIN reports
ON reportkeys.report_name = reports.report_name;

You will need some indexes to support the subqueries

ALTER TABLE users   ADD INDEX callsign_macroregion_ndx (callsign,macroregion);
ALTER TABLE regions ADD INDEX macroregion_region_ndx (macroregion,region);
ALTER TABLE reports ADD INDEX region_report_name_ndx (region,report_name);
ALTER TABLE reports ADD INDEX report_name_ndx (report_name);

The first 3 indexes are called covering indexes. They are called such because the subqueries calls for only those exact columns. Therefore, no need to read from the table. The data is only fetched from the index.

You should also study how to refactor queries to gather keys, perform WHERE clauses early, and perform JOINs last. Here is great YouTube Video on how to do this: http://youtu.be/ZVisY-fEoMw (Based on this book : Refactoring SQL Applications)

Give it a Try !!!