I notice that when I am executing queries the first time (i.e., just after mysqld_safe --user=mysql &
), the queries are slower than when they are executed second time. When the query (source) is:
select
s_name, s_address
from
SUPPLIER, NATION
where
s_suppkey in (
select
ps_suppkey
from
PARTSUPP
where
ps_partkey in (
select
p_partkey
from
PART
where
p_name
like
'green%'
) and
ps_availqty > (
select
0.5 * sum(l_quantity)
from
LINEITEM
where
l_partkey = ps_partkey and
l_suppkey = ps_suppkey and
l_shipdate >= date '1993-01-01' and
l_shipdate < date '1993-01-01' + interval '1' year
)
) and
s_nationkey = n_nationkey and
n_name = 'ALGERIA'
order by
s_name;
And I got:
> SELECT .....;
> 41.255s -------- first time after startup
> SELECT .....;
> 6.242s -------- second time after startup
> SELECT .....;
> 6.104s -------- third time after startup
However, in some other times, the difference does not exist. For example, this query(source):
SELECT
s_name, count(*) as numwait
FROM
SUPPLIER, LINEITEM l1, ORDERS, NATION
WHERE
s_suppkey = l1.l_suppkey and
o_orderkey = l1.l_orderkey and
o_orderstatus = 'F' and
l1.l_receiptdate > l1.l_commitdate and
exists (
SELECT *
FROM LINEITEM l2
WHERE
l2.l_orderkey = l1.l_orderkey and
l2.l_suppkey <> l1.l_suppkey
) and
not exists (
SELECT *
FROM LINEITEM l3
WHERE
l3.l_orderkey = l1.l_orderkey and
l3.l_suppkey <> l1.l_suppkey and
l3.l_receiptdate > l3.l_commitdate
) and
s_nationkey = n_nationkey and
n_name = 'EGYPT'
GROUP BY
s_name
ORDER BY
numwait desc, s_name
LIMIT 100;
I got:
> SELECT .....;
> 3m9.264s -------- first time after startup
> SELECT .....;
> 3m9.377s -------- second time after startup
> SELECT .....;
> 3m7.287s -------- third time after startup
>
> ------ This time, the difference do not exist.
MySQL version is 8.0.22
All configurations are default.
Why it is the case?
Thanks.
Best Answer
As others have mentioned in the comments, likely the performance improvement on subsequent runs that you saw in your first example is due to the data being cached in memory after your first run. There are multiple types of caches in MySQL, so even if you're not using the Query Cache, your MySQL instance uses the Table Cache and the
buffer_pool
which means the data from your query is actually cached in memory so that subsequent runs of that query don't need to fetch the data from disk again. This is beneficial if the size of the tables in your query are large.The reason your second query didn't see much of a difference in performance on subsequent runs (though I would assume the raw table data was still cached) was because most of the runtime you're experiencing is probably due to the complexity of the query itself. Some of those complexities are from the
EXISTS
&NOT EXISTS
that can be written in a more relationally efficient way, multiple inequality predicates, and the fact you're doing a COUNT() which may result in scans instead of seek operations, for example. TheEXPLAIN ANALYZE
/ query plan for your second query would probably give more insight on specifically which part of it is the complexity that is causing most of the runtime.