The next to last major step was to find out that it is, unfortunately for MySQL users, only MySQL that joins slowly, so these write intensive tables were all given unique integer identifiers to join upon. That took off 1/3 of the remaining average time consumption reduction.
The last step was taking CTE optimization fencing into account which took off the last 2/3 of reduction.
In the case of the above set of queries, the average time consumed is now between 1 and 2ms on an i7 laptop tuned for SSDs even though I'm using the included hard drive. On a server with a single SSD, the time consumed averages less than 1ms.
To get the last performance boost, almost all queries were condensed, so for example if a table needed to be updated, it's best to do it in one query no matter how strange or performance reducing it might appear.
Reading to assemble relevant primary key values required more testing. If it was a combination of RETURNINGs, it was best to allow each sub-statement to recalculate it rather than depend upon another sub-statement to do the calculation. If the read had to go to disk, even if it was small, it was best to do that once and reference it across the multiple writing queries.
Copying was another grey area. If a table required inserts, updates, and deletes based upon data from another table, it was slower to pre-select the data. It was faster to simply reference the primary key data.
In general, aside from reads that go to disk to assemble primary key values to reference, it is best to compress a CTE as much as possible though it may appear strange to keep the chains as short and narrow as possible.
Multi-prepared statement transactions that previously consumed 10s of ms in my application now consume max 5ms.
This approach should probably be limited to small amounts of redundant data.
Probably you have an ordinary "cold cache" case.
When you INSERT 10,000 rows into a table, the rows need to be added to the appropriate place(s) in the "data". Also, 10,000 entries need to be added to each index's BTree. (How many indexes do you have? Please provide SHOW CREATE TABLE
.)
If you have AUTO_INCREMENT
, then the rows will be "appended" to the table; this is not a caching issue. If, on the other hand, you have a UUID index, or some other 'random' index, then the "inserts" into the index will be random -- involving read-modify-write. The read and the write are cached. But, if the system were "cold" when you did the first 10,000, a lot of reads were probably necessary. Multiply that by how many indexes you have.
How big is the table? If it is small enough to fit into innodb_buffer_pool_size
(if InnoDB) or small enough for the indexes to fit into key_buffer_size
(if MyISAM), then soon all the index blocks will be cached, and the inserts will speed up.
If the table is too big for the cache, then the 'random' indexes will continue to hit the disk (and be slow). AUTO_INCREMENT
will continue to be fast.
5 seconds is about 500 reads from a commodity spinning drive. So, I would guess that you have a modest sized table. Since the next 10K rows go in <1sec (<100 disk hits), I will guess that it fits in cache so far.
I often recommend 100-1000 for chunk size. This is partially because 5 sec can be a problem. You may as well COMMIT
after each chunk, since the benefit of delaying it is minor. Furthermore, if you wait "too long" to commit, the 'transaction' could overflow the log_file, leading to an inefficiency.
A chunk size of 100 is about 10% slower than the theoretical maximum. 1000 is about 1% slower than max.
If you have replication, keep in mind that the 5 seconds will interfere with anything else being replicated. (Another argument against 10K.)
Best Answer
The post you found is from 2007. Rather start with the current manual:
And:
Bold emphasis mine. Read more details in the manual.
I wouldn't know of projects trying to convert prepared statements into executables.