MySQL – Performance Impact of Maximum TEXT/BLOB Columns per Table

database-designMySQLmysql-5.6performanceschema

I've created a system that generically imports CSVs into a database table with metadata columns for keeping track of when a CSV record has been changed, and it allows filtering on CSV columns instead of having to read the entire file each time. Right now, the structure of the table is essentially:

id, csv_id, hash, date_modified, column_0, column_1, ..., column_63, column_64

The columns are all VARCHAR(255) because I wanted the maximum number of columns with the maximum amount of size. My database is UTF-8, so each of the column sizes has to be multiplied by three and that value has to be below 65,535. This is the size of my current schema:

255 * 65 * 3 = 49,725

255 (*3) bytes is too small because URL columns might be stored in the database that can be up to 512 bytes or more. My question is if I made the current schema of:

column_0 (TEXT), column_1 (TEXT), ..., column_63 (TEXT), column_64 (TEXT)

Is there a maximum number of TEXT columns that can be on a database table? I'm using MySQL 5.6 right now but I'll be migrating the database to Amazon Aurora. My gut says the performance of this would be horrendous, and I do need to filter against the content of the columns.

If the performance hit isn't too large for a smaller subset of columns, I'm considering a hybrid solution:

id, csv_id, hash, date_modified, data LONGTEXT, column_0 VARCHAR(512), ..., column_19 VARCHAR(512)

Where data is a JSON array and the 20 columns are indices mapped elsewhere in the system, but if changing them all to TEXT (since the columns can be as longer, theoretically), how would a design of:

id, csv_id, hash, date_modified, data LONGTEXT, column_0 TEXT, ..., column_19 TEXT

Differ?

Best Answer

If you lump all the column_## columns into a JSON field, you can have a limit of 4GB.

The question of how much gets complicated. InnoDB has 4 ROW_FORMATs: REDUNDANT, COMPACT, DYNAMIC, COMPRESSED. I think you could have 400 columns of VARCHAR(255) or TEXT (or ...) in ROW_FORMAT=DYNAMIC. That format uses a 20-byte pointer in the record to where the actual text is stored.

If you changed the block size from the default 16KB to 32KB, you could get about 800 columns. Going to 64KB would hit the limit of 1017 columns (in 5.6.9). Changing the block size is non trivial, so I don't recommend it.

More Limits. (Hmmm... maybe I should add your question there.)

These two statements do go together: "performance of this would be horrendous" and "I do need to filter against the content of the columns". This is because DYNAMIC has an extra disk fetch for each column you need to 'filter' on. On the other hand, in some situations (short text, not too many columns, etc), the entire text of some of the columns may be stored inline.

COMPACT might actually be better. This format tries to keep the text in the record. But if the record would become bigger than about 8K, it peels off the longer cells for off-block storage. It will put the first 768 bytes of record inline. From that, you are limited to about 10 columns. (Try 12 columns, it will spit at you.)

I believe you will need innodb_file_per_table=1 for DYNAMIC.

How will you be 'filtering'?