The BENCKMARK function only works on expressions.
What qualifies as an expression?
- Stored Function (has a return value)
- ENCODE function on a String
- 1 + 2 (that expression gives an integer)
ALTERNATIVE
What you need is to simulate the BENCHMARK function yourself.
Here is some sample code for you to try
drop database if exists mybmark;
create database mybmark;
use mybmark
DELIMITER $$
DROP PROCEDURE IF EXISTS `mybmark`.`testproc` $$
CREATE PROCEDURE `mybmark`.`testproc` ()
BEGIN
DECLARE answer INT;
SELECT 1+2 INTO answer;
END $$
DELIMITER ;
DELIMITER $$
DROP PROCEDURE IF EXISTS `mybmark`.`mybenchmark` $$
CREATE PROCEDURE `mybmark`.`mybenchmark` (loop_count INT,expr varchar(128))
BEGIN
DECLARE dt1,dt2,dtdiff,ndx INT;
SET dt1 = UNIX_TIMESTAMP();
SET ndx = loop_count;
SET @sql = expr;
PREPARE stmt FROM @sql;
WHILE ndx > 0 DO
EXECUTE stmt;
SET ndx = ndx - 1;
END WHILE;
DEALLOCATE PREPARE stmt;
SET dt2 = UNIX_TIMESTAMP();
SET dtdiff = dt2 - dt1;
SELECT dt1,dt2,dtdiff;
END $$
DELIMITER ;
To benchmark the testproc procedure, just pass the call to the testproc procedure as a parameter. For example to call the testproc procedure 1,000,000 times, do this:
mysql> call mybmark.mybenchmark(1000000,'CALL mybmark.testproc()');
+------------+------------+--------+
| dt1 | dt2 | dtdiff |
+------------+------------+--------+
| 1333474085 | 1333474102 | 17 |
+------------+------------+--------+
1 row in set (16.80 sec)
Query OK, 0 rows affected (16.80 sec)
mysql>
Give it a Try !!!
The widely used tool is the SQL command EXPLAIN ANALYZE
, possibly with more options for more details in the answer. That outputs the query plan with the planner estimates plus actual execution times.
Why would you want to clear the cache? The generally more likely use case is that the cache is populated. If you still want to go that route, here is a related answer on SO.
Not resetting the cache, here are two simple ways to test with many iterations:
Simple UDF
EXPLAIN ANALYZE
SELECT f_myfunc(g) FROM generate_series (1,1000) AS t(g);
Or with random input - random numbers between 0 and 5000 in the example:
EXPLAIN ANALYZE
SELECT f_myfunc((random()*5000)::int) FROM generate_series (1,1000) AS t(g);
Or with a real life table:
EXPLAIN ANALYZE
SELECT f_myfunc(my_column) FROM my_tbl; -- LIMIT n
More complex functions / queries
CREATE FUNCTION f_test(ct int, sql text) RETURNS void AS
$func$
DECLARE
i int;
BEGIN
FOR i IN 1 .. $1 LOOP
EXECUTE sql; -- not safe against SQLi!
END LOOP;
END
$func$ LANGUAGE plpgsql
Call:
EXPLAIN ANALYZE
SELECT f_test(100, $x$SELECT * from MADLIB.gp('mock3', '{x1, x2, x3}', '{y1}', 100,20, 3)$x$
Careful: The query is actually executed!
Careful: Not fit for public use. Possible SQL injection.
Again, you can use random parameters if needed. Possibly with the USING
clause of EXECUTE
.
Best Answer
(I realize that this does not "answer" the Question. But it is too long for a Comment.)
Good luck. re "Almost identical" -- Probably because they were "identical" except for random noise in the timing. The switchable optimizations are, in general, "binary" and "optional".
Consider an obvious situation: You have an index that could be used in a query. Still, the Optimizer will deliberately consider whether to ignore the index and simply scan the table. It may be more costly to bounce between the index and the table, hence using the index would slow down execution. This is a "binary" decision -- use vs ignore index. The Optimizer cannot know the exact cutoff between the choices. It is influenced by HDD vs SDD, buffer_pool_size, other activity, statistics, etc, etc.
That is, a benchmark would be complicated by having more than 2 choices (an index exists or not):
FORCE INDEX
vsIGNORE INDEX
vs 'allowed' the Optimizer to chooseAny of these could change whether the Optimizer would pick a particular optimization. Also, the timings would change by a relatively unpredictable amount in cases such as these:
ANALYZE TABLE
will sometimes lead to changing things significantly.OPTIMIZE TABLE
-- ditto.Over the years, I occasionally see "index merge intersect". (
INDEX(a), INDEX(b)
withWHERE a=1 AND b=2
) In every case, a composite index would be better (INDEX(a,b)
).Change that
AND
toOR
and no index is useful -- except when "index merge union" is used. This almost never happens. It takes two index range scans, a merge, and then something like a "join".EXPLAIN
does not show much info.EXPLAIN FORMAT=JSON
is better. The Optimizer Trace shows a different set of info.I don't know of a test set like what you seek. I see the writing of even one test to be quite a challenge (see above). Furthermore, interpreting the results would be challenging.
If you do succeed with your goal, please blog about it.