After version 5.0.3 (which allowed VARCHAR to be 65,535 bytes and stopped truncating trailing spaces), is there any major difference between these two data types?
I was reading the list of differences and the only two of note are:
For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional. See Section 7.5.1, “Column Indexes”.
and
BLOB and TEXT columns cannot have DEFAULT values.
So, because of these two limitations on the TEXT datatype, why would you use it over varchar(65535) ? Are there performance ramifications of one over the other?
Best Answer
divided linked to some info that explains the basic issue (there's performance differences), but it's not simple enough to say that one's always better than the other. (otherwise, there'd be no reason to have both.) Also, in MyISM, the 64k max size for VARCHAR isn't per field -- it's per record.
Basically, there's 4 ways to store strings in database records:
MyISM uses something similar to #3 for VARCHAR, and a hybrid approach for TEXT where it stores the beginning of the string in the record, then rest of the string somewhere else. InnoDB is similar for VARCHAR, but stores the complete TEXT field outside of the record.
With 1&4, the stuff in the record is always the same length, so it's easier to skip over if you don't need the string, but need stuff after it. Both #2 and #3 aren't too bad for short strings ... #2 has to keep looking for the marker, while #3 can skip ahead ... as the strings get longer, #2 gets worse for this particular use case.
If you actually need to read the string, #4 is slower, as you have to read the record, then read the string which might be stored elsewhere on the disk, depending on just how that database handles it. #1 is always pretty straightforward, and again you run into similar issues where for #2 gets worse the longer the string is, while #3 is a little worse than #2 for very small strings, but better as it gets longer.
Then there's storage requirements ... #1 is always a fixed length, so it might have bloat if most strings aren't the max length. #2 has 1 extra byte; #3 typically has 2 extra bytes if max length = 255, 4 extra bytes if a 64k max. #4 has the pointer length, plus the rules for #3 typically.
For the specific implementations within MySQL 5.1, the docs for MyISM state:
While for InnoDB :
...
as with so many other things when dealing with databases, if you're not sure what's best for your needs, try benchmarking it with similar data & usage, and see how they behave.