MySQL Performance – Are TEXT Fields Stored Separately from the Row in MySQL-ISAM?

innodbmyisamMySQLperformance

We have a "notes" table and some of the queries are getting very slow even though we're not searching on the note contents. We're wondering if moving the note content into a separate table would speed things up?

I wouldn't think separating the content would matter — aren't TEXT fields essentially just pointers to somewhere else on disk? Or does table size matter?

Does InnoDB handle this better?

Best Answer

MyISAM only -- Normally a record is a continuous stream of bytes in the .MYD file. This includes TEXT and BLOB columns. An index has a byte offset (or record number) to point into the .MYD file.

After row(s) have been DELETEd, there can be holes in the .MYD. MyISAM prefers to fill in the holes before appending to the .MYD. However the hole(s) may not be big enough. In this case, part of a record is stored in the first hole and a link (byte offset) is stored to the next part of the record. This linking goes on as long as necessary.

So, if there is a lot of "churn" in the table (deletes+inserts, etc), then new INSERTs can be very inefficiently stored. That is, they may be fragmented and scattered. This is one of the few cases where OPTIMIZE TABLE is useful for a MyISAM table. The OPTIMIZE will rebuild the table by effectively reading and rewriting each row into a new (tmp) .tmd, later to be renamed to .MYD. This defragments rows and gets rid of any remaining gaps.

There is no "block" structure to the .MYD. In contrast, the .MYI, holds all indexes, including the PRIMARY KEY, in 1KB BTree blocks.

The only caching for data is what the OS provides. That tends to be in 4GB chunks that have zero correlation to record boundaries of the rows in the table.

When to have parallel table -- The question is really about this. (But I needed to say the above first.)

It is hard to give a clear indication of when it is useful to manually split the table by columns.

Case 1: You do a lot of searching through the table without touching the TEXT field. Having the big TEXT field(s) else where avoids stepping over those big cow paddies. The JOIN to get the TEXT for one row is not that expensive.

Case 2: You are always doing LIKE on the TEXT field. Then vertical partitioning slows things down.

I have done Case 1 in a few projects.

Case 1 can be further enhanced by this trick:

SELECT a.stuff, b.text
    FROM ( SELECT ... FROM main_table ... LIMIT 1 ) a
    JOIN text_table b USING (id);

Notes:

  • The subquery hits a lot of rows, but then decides on a few (via LIMIT or GROUP BY, or...);
  • Any tmp tables are not big (since the TEXT is not in it);
  • Only 1 probe into the parallel table are needed.

Another trick is to Compress the text field (and put it into a BLOB).

  • Do the compression in the client, not the server
  • 3:1 shrinkage is typical for text
  • The space savings (hence I/O savings) is likely to more than compensate for the CPU time in compression if you have a big table.

(The two "tricks" can be applied to InnoDB, to.)