Mysql – Histogram bin width in MySQL

MySQL

I want to create an histogram based on price column values from equipment table.

I know I can count the frequency of prices in an interval I can set, but the problem is that I won't get the ranges with 0 frequency.

so if I want an interval of 200 and the values are:

0,55,150,420,490,550

I will only get the ranges:

0-200, count = 3

400-600, count = 3

as a solution I can write explicitly the ranges, like:

SELECT 
count(CASE WHEN price>= 0 AND price < 200 THEN 1 END) AS '[0 - 200]',
count(CASE WHEN price>= 200 AND price < 400 THEN 1 END) AS '[200 - 400]'
...
...
from equipment
group by eq_type;

if I want to set multiple ranges, let's say from 0 to 10000 in steps of 200,
is there a way I can create an array/series and use it in the query instead of typing 50 rows of the above code?

Best Answer

You need to create synthetic table containing all the intervals in Your timeline range. Or the numbers table when range limits are easily calculated from the range number.

Example:

CREATE TABLE test (val INT);
INSERT INTO test VALUES (0),(55),(150),(420),(490),(550);

SELECT (nums.num-1)*200 mini, nums.num*200 maxi, COUNT(test.val) cnt
FROM (SELECT 1 num UNION ALL SELECT 2 UNION ALL SELECT 3) nums
LEFT JOIN test ON (nums.num-1)*200 <= test.val AND test.val < nums.num*200
GROUP BY nums.num;

The numbers count is to be equal (or more, not less) to the ranges count.

If You want to change the amount of ranges dynamically influenced by data values range (with some reasonable limit) You may use the next method to create synthetic table nums (for example if ranges count guaranteed less than 1000):

SELECT nums1.num*100+nums2.num*10+nums3.num num
FROM (SELECT 0 num UNION SELECT 1 UNION ... UNION SELECT 9) nums1,
     (SELECT 0 num UNION SELECT 1 UNION ... UNION SELECT 9) nums2,
     (SELECT 0 num UNION SELECT 1 UNION ... UNION SELECT 9) nums3

To cut off the upper empty ranges use additional condition like

WHERE ... AND nums.num * @step <= (SELECT MAX(val) FROM datatable)

Related Solutions

Mysql – the most efficient way to exclude records with a particular ID in a MySQL query for random records

I'm avoiding using RAND() because it has been said it doesn't work for large tables. I wonder how large is large?

As for "doesn't work", I say that is false. RAND() does not repeat after 64K, like some random number generators.

As far as "how large", time the first query below -- if it is "too slow" then the table is "too large". Then skip to the link at the bottom.

Case 1 Done all at once, this gives you $N random, not repeated, rows:

$array = SELECT id FROM ... WHERE ... ORDER BY RAND() LIMIT $N;

Case 2 If you need to fetch the rows individually, such as with separate requests and minimal "memory" between requests:

$rnd = UNIX_TIMESTAMP();
$offs = $offs + 1;
$array = SELECT id FROM ... WHERE ... ORDER BY RAND($rnd) LIMIT $N OFFSET $offs;

By remembering $rnd and $x (eg passing via a URL or SESSION), you can get the "next" one in a predictable sequence.

Caveat 1: If rows are inserted/deleted, this sequence can mess up.
Caveat 2: If the table is 'large', the SELECT could be costly.

8 Faster ways to get random rows. None is 'perfect', and each depends on something not in evidence, such as whether there is an auto_increment id and whether that has gaps.

Mysql – Any script to auto test tune and benchmark thesql config

Such a tool would be almost completely worthless in practical use.

max_connections has nothing to do with performance. It's a safety mechanism to prevent a runaway application from making so many connections that the server becomes overwhelmed. The value makes absolutely no difference unless you are using that many connections. It is also a value that you should never reach under normal operating conditions.

table_open_cache I had hoped by now that everybody knew just to leave this one alone. It tends to scale negatively -- the more "sensible" the value seems to be, the more wrong you are. The worst performing servers in my entire network were the ones where someone, at some time in the past, had felt a need to mess with this.

query_cache_type just set it to 1, with a reasonable size, something in the 32 to 128 MiB range, and walk away. You will be hard-pressed to develop a meaningful benchmark here, and a small query cache definitely has its value for many workloads. But keep it reasonably small, or turn it off. Above a certain size, it will scale negatively and degrade performance. The more processor cores you have, the more potential exists for the query cache mutex to become a bottleneck. For workloads where there is virtually no chance of a cache hit, it's a waste of cpu cycles.

innodb_buffer_pool_size has only one correct answer: How much memory can you spare? Set that here. Lower will always be slower, and higher will also be slower if the server goes into swap, and of course infinitely slower if it crashes because the system ran out of memory entirely. Correlated to this, if you're running anything else on your database server, you are doing it wrong.

innodb_buffer_pool_instances As long as each pool is at least 1GB, there's not a lot of variation available or needed here on a decent sized server.

innodb_adaptive_hash_index_partitions is probably a poster child for an argument that applies in some sense to all of the other values above, and some not mentioned: even if you tweaked this to find a sweet spot with sysbench... who cares? How does that help you? It doesn't.

Your specific workload is the only thing really relevant to the ideal values for the adaptive hash index, the query cache, and so many of the other tweakable parameters on MySQL Server.

Given two servers with "identical everything" -- storage, memory, CPUs, background cosmic radiation levels -- identical everything, that is, except the workload (meaning the schemata, datafill, and the actual queries being run) the "optimum" configuration parameters may still be wildly divergent.

Benchmarking with anything other than a real live workload will not give you anything of significant value, and tweaking a server is not usually the ticket to performance.

Well-written queries against well-structured schemata and appropriately indexed tables, and use of read replicas, will take you much further than tweaking.

Best Answer

Related Solutions

Mysql – the most efficient way to exclude records with a particular ID in a MySQL query for random records

Mysql – Any script to auto test tune and benchmark thesql config

Related Question