I'm avoiding using RAND() because it has been said it doesn't work for large tables. I wonder how large is large?
As for "doesn't work", I say that is false. RAND()
does not repeat after 64K, like some random number generators.
As far as "how large", time the first query below -- if it is "too slow" then the table is "too large". Then skip to the link at the bottom.
Case 1 Done all at once, this gives you $N random, not repeated, rows:
$array = SELECT id FROM ... WHERE ... ORDER BY RAND() LIMIT $N;
Case 2 If you need to fetch the rows individually, such as with separate requests and minimal "memory" between requests:
$rnd = UNIX_TIMESTAMP();
$offs = $offs + 1;
$array = SELECT id FROM ... WHERE ... ORDER BY RAND($rnd) LIMIT $N OFFSET $offs;
By remembering $rnd and $x (eg passing via a URL or SESSION), you can get the "next" one in a predictable sequence.
Caveat 1: If rows are inserted/deleted, this sequence can mess up.
Caveat 2: If the table is 'large', the SELECT
could be costly.
8 Faster ways to get random rows. None is 'perfect', and each depends on something not in evidence, such as whether there is an auto_increment id and whether that has gaps.
Such a tool would be almost completely worthless in practical use.
max_connections
has nothing to do with performance. It's a safety mechanism to prevent a runaway application from making so many connections that the server becomes overwhelmed. The value makes absolutely no difference unless you are using that many connections. It is also a value that you should never reach under normal operating conditions.
table_open_cache
I had hoped by now that everybody knew just to leave this one alone. It tends to scale negatively -- the more "sensible" the value seems to be, the more wrong you are. The worst performing servers in my entire network were the ones where someone, at some time in the past, had felt a need to mess with this.
query_cache_type
just set it to 1, with a reasonable size, something in the 32 to 128 MiB range, and walk away. You will be hard-pressed to develop a meaningful benchmark here, and a small query cache definitely has its value for many workloads. But keep it reasonably small, or turn it off. Above a certain size, it will scale negatively and degrade performance. The more processor cores you have, the more potential exists for the query cache mutex to become a bottleneck. For workloads where there is virtually no chance of a cache hit, it's a waste of cpu cycles.
innodb_buffer_pool_size
has only one correct answer: How much memory can you spare? Set that here. Lower will always be slower, and higher will also be slower if the server goes into swap, and of course infinitely slower if it crashes because the system ran out of memory entirely. Correlated to this, if you're running anything else on your database server, you are doing it wrong.
innodb_buffer_pool_instances
As long as each pool is at least 1GB, there's not a lot of variation available or needed here on a decent sized server.
innodb_adaptive_hash_index_partitions
is probably a poster child for an argument that applies in some sense to all of the other values above, and some not mentioned: even if you tweaked this to find a sweet spot with sysbench... who cares? How does that help you? It doesn't.
Your specific workload is the only thing really relevant to the ideal values for the adaptive hash index, the query cache, and so many of the other tweakable parameters on MySQL Server.
Given two servers with "identical everything" -- storage, memory, CPUs, background cosmic radiation levels -- identical everything, that is, except the workload (meaning the schemata, datafill, and the actual queries being run) the "optimum" configuration parameters may still be wildly divergent.
Benchmarking with anything other than a real live workload will not give you anything of significant value, and tweaking a server is not usually the ticket to performance.
Well-written queries against well-structured schemata and appropriately indexed tables, and use of read replicas, will take you much further than tweaking.
Best Answer
You need to create synthetic table containing all the intervals in Your timeline range. Or the numbers table when range limits are easily calculated from the range number.
Example:
The numbers count is to be equal (or more, not less) to the ranges count.
If You want to change the amount of ranges dynamically influenced by data values range (with some reasonable limit) You may use the next method to create synthetic table
nums
(for example if ranges count guaranteed less than 1000):To cut off the upper empty ranges use additional condition like