Mysql – How to benchmark Switchable Optimizations in MySQL

benchmarkMySQLoptimizationperformance

I want to benchmark the performance of Switchable Optimizations in MySQL, i.e., comparing the query latency and the result of EXPLAIN with/without using these optimization switches.

I have already tried to use sysbench and MySQL official test-suite. But I fond that they are too general and mainly for functional testing, i.e., runing these tests with/without the optimization switches, the performance are almost identical. What I want is sth like a performance test suite.

Could you please suggest me some good tools/suites to fulfill my demands?
Thanks!

Best Answer

(I realize that this does not "answer" the Question. But it is too long for a Comment.)

Good luck. re "Almost identical" -- Probably because they were "identical" except for random noise in the timing. The switchable optimizations are, in general, "binary" and "optional".

Consider an obvious situation: You have an index that could be used in a query. Still, the Optimizer will deliberately consider whether to ignore the index and simply scan the table. It may be more costly to bounce between the index and the table, hence using the index would slow down execution. This is a "binary" decision -- use vs ignore index. The Optimizer cannot know the exact cutoff between the choices. It is influenced by HDD vs SDD, buffer_pool_size, other activity, statistics, etc, etc.

That is, a benchmark would be complicated by having more than 2 choices (an index exists or not):

FORCE INDEX vs IGNORE INDEX vs 'allowed' the Optimizer to choose
for "allowed" the Optimizer picks the better choice vs fails to

Any of these could change whether the Optimizer would pick a particular optimization. Also, the timings would change by a relatively unpredictable amount in cases such as these:

Adding more rows to a table
Changing the size of a 'range'
ANALYZE TABLE will sometimes lead to changing things significantly.
OPTIMIZE TABLE -- ditto.

Over the years, I occasionally see "index merge intersect". (INDEX(a), INDEX(b) with WHERE a=1 AND b=2) In every case, a composite index would be better (INDEX(a,b)).

Change that AND to OR and no index is useful -- except when "index merge union" is used. This almost never happens. It takes two index range scans, a merge, and then something like a "join".

EXPLAIN does not show much info. EXPLAIN FORMAT=JSON is better. The Optimizer Trace shows a different set of info.

I don't know of a test set like what you seek. I see the writing of even one test to be quite a challenge (see above). Furthermore, interpreting the results would be challenging.

If you do succeed with your goal, please blog about it.

ALTERNATIVE

What you need is to simulate the BENCHMARK function yourself.

Here is some sample code for you to try

drop database if exists mybmark;
create database mybmark;
use mybmark
DELIMITER $$
DROP PROCEDURE IF EXISTS `mybmark`.`testproc` $$
CREATE PROCEDURE `mybmark`.`testproc` ()
BEGIN

    DECLARE answer INT;

    SELECT 1+2 INTO answer;

END $$
DELIMITER ;
DELIMITER $$
DROP PROCEDURE IF EXISTS `mybmark`.`mybenchmark` $$
CREATE PROCEDURE `mybmark`.`mybenchmark` (loop_count INT,expr varchar(128))
BEGIN
    DECLARE dt1,dt2,dtdiff,ndx INT;

    SET dt1 = UNIX_TIMESTAMP();
    SET ndx = loop_count;
    SET @sql = expr;
    PREPARE stmt FROM @sql;
    WHILE ndx > 0 DO
        EXECUTE stmt;
        SET ndx = ndx - 1;
    END WHILE;
    DEALLOCATE PREPARE stmt;
    SET dt2 = UNIX_TIMESTAMP();
    SET dtdiff = dt2 - dt1;

    SELECT dt1,dt2,dtdiff;

END $$
DELIMITER ;

To benchmark the testproc procedure, just pass the call to the testproc procedure as a parameter. For example to call the testproc procedure 1,000,000 times, do this:

mysql> call mybmark.mybenchmark(1000000,'CALL mybmark.testproc()');
+------------+------------+--------+
| dt1        | dt2        | dtdiff |
+------------+------------+--------+
| 1333474085 | 1333474102 |     17 |
+------------+------------+--------+
1 row in set (16.80 sec)

Query OK, 0 rows affected (16.80 sec)

mysql>

Give it a Try !!!

PostgreSQL Benchmarking – How to Benchmark a PostgreSQL Query

The widely used tool is the SQL command EXPLAIN ANALYZE, possibly with more options for more details in the answer. That outputs the query plan with the planner estimates plus actual execution times.

Why would you want to clear the cache? The generally more likely use case is that the cache is populated. If you still want to go that route, here is a related answer on SO.

Not resetting the cache, here are two simple ways to test with many iterations:

Simple UDF

EXPLAIN ANALYZE
SELECT f_myfunc(g) FROM generate_series (1,1000) AS t(g);

Or with random input - random numbers between 0 and 5000 in the example:

EXPLAIN ANALYZE
SELECT f_myfunc((random()*5000)::int) FROM generate_series (1,1000) AS t(g);

Or with a real life table:

EXPLAIN ANALYZE
SELECT f_myfunc(my_column) FROM my_tbl;   -- LIMIT n

More complex functions / queries

CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
   i int;
BEGIN

FOR i IN 1 .. $1 LOOP
    EXECUTE sql;  -- not safe against SQLi!
END LOOP;

END
$func$ LANGUAGE plpgsql

Call:

EXPLAIN ANALYZE
SELECT f_test(100, $x$SELECT * from MADLIB.gp('mock3', '{x1, x2, x3}', '{y1}', 100,20, 3)$x$

Careful: The query is actually executed!
Careful: Not fit for public use. Possible SQL injection.

Again, you can use random parameters if needed. Possibly with the USING clause of EXECUTE.

Best Answer

Related Solutions

MySQL – How to Use ‘select benchmark’ Command on a Procedure

ALTERNATIVE

PostgreSQL Benchmarking – How to Benchmark a PostgreSQL Query

Simple UDF

More complex functions / queries

Related Question