Phrases like bad for performance are essentially meaningless. Measurements are meaningful; opinions are, umm, not measurements.
But int(7)
and int(11)
are both 32-bit integers. The number in parens refers to display width; it has nothing to do with storage size or index size.
For varchar()
columns, you can create an index that's based on the first 'n' characters. That decision has some performance implications, for example when you're searching for 'wibble' in a column indexed on the first three characters. And there are several subtle points with columns of type "text". Read CREATE TABLE carefully. (Search for "length".)
But I'd be a little surprised if you could measure any difference in joining columns of varchar(200)
/varchar(200
) and in joining columns of varchar(220)
/varchar(200)
. That is, other than the obvious--storing more than 200 characters in the varchar(220)
column.
There's an easy way to tell: write some code in your favorite scripting language, generate a few million rows of sample data, load it, and test it. And post your results here, too.
Your questions
Q1: If PK is the Partition key, a "point query" (WHERE PK = constant
) on the Partition key will do "pruning", then (hopefully) use an index in the single partition it found. No improvement over having a simple table with a suitable index.
Q1: If PK is the PRIMARY KEY
, but not the Partition key, then the query must open and look in every partition to find the row, hence slower.
Q2: Generally IN ( SELECT ... )
performs poorly, very poorly in some cases, and should be avoided. Use a JOIN
instead. For partitioning, "pruning" will probably not be used. Hence slow.
Q3: INSERT
may be about the same speed, regardless. A plain table may be faster due to less complexity.
Q3: REPLACE
is DELETE
+ INSERT
. INSERT ... ON DUPLICATE KEY UPDATE ...
is usually a better construct; see if it applies. But, again, no performance improvement.
IN
WHERE PK IN (1, 2, 3, 4...)
-- I think I heard that if the list is 'short', it will use partition pruning. Or skip the pruning if the list is long. In either case, a simple table with an index on PK will be at least as fast.
Pruning
Think of pruning this way: First it must find the partition (which acts very much like a "sub table"), then it must use an index (if available) to drill down to the desired row(s). Without partitioning, it skips the pruning step; but the BTree index is slightly deeper. So it is a trade-off.
Note: This often means that the optimal set of indexes for a partitioned table and the equivalent non-partitioned table are different.
Case 3
The caveat "A table's index is too big to be cached, but the index for one partition is cacheable" does not seem to apply to your situation. Case 3 came from a time-series wherein most of the activity was in the latest partition (and PARTITION BY RANGE(TO_DAYS(...))
) and that partition fit in the buffer_pool but the entire table did not.
(That is, I agree with Natan, not Jarwad, about the relevance of Case 3.)
BY HASH
BY HASH
-- not useful. (From the link: "PARTITION BY RANGE is the only useful method.") (Or at least, I have yet to see a use case of Hash where performance benefits.)
Ingestion
"growing a few million per month" That's not very fast. "growing a few million per day" would begin to be challenging, at which point I would point you at my high speed ingestion blog . Even so, you might take one tip from there -- Load the changes into a transient table, then do the insert/replace/iodku/update/etc from it to the 'real' table.
Best Answer
MariaDB-10.1.1 has
max_statement_time
to abort long-running statements. I don't think MySQL has anything equivalent.(Of course, you need at least 10.2 to get Recursive CTEs.)