Mysql – Does size of one particular column in a database affect query performance

MySQLperformance

I have a table user_list

CREATE TABLE IF NOT EXISTS `user_list` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `username` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `email` varchar(40) NOT NULL,
  `userdata` text NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `username` (`username`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1000265 ;

The queries I perform on this table are

  • SELECT email WHERE id = XXXX
  • SELECT userdata WHERE id = XXXX
  • SELECT id WHERE username = XXXX
  • UPDATE user_list SET userdata WHERE id = XXXX (used pretty
    frequently)
  • UPDATE user_list SET email WHERE id = XXXX
    (These are the only I queries I am going to perform on this table)

Now the problem is userdata field stores all the information about the user and is relatively large.Its text generally contains about 2000 characters. Say I am going to insert some 10 million+ entries in the database.

Does the size of this userdata field affect the performance of the queries I mentioned above?

If I store the userdata text as a separate file and just store the path in the database, will there be a significant performance difference? Since queries 1,3 and 5 do not involve the userdata field, will their performances be increased significantly since I have removed the userdata field from the table?

Is having a separate table user_data_list with the fields for the userdata

----------id----------userdata-----------------

provide any advantages over how it is now? Does having such a separate table have any disadvantages?

Best Answer

A short answer would be: Yes, it affects performance.

A database stores it's data inside formatted database blocks. These blocks are the basis of all caching mechanism, so caches will store database blocks. Querying the database will move some read blocks into the query cache.

You can imagine, the smaller a block is, the more blocks can stay safely inside the caches without having to drop them for new blocks from new queries.

However, I'd say that in many scenarios that store data this way, performance differences of the kind I described might not be the most prominent factor for the underlying system. A benchmark might find differences, but for a system to react noticeably faster you need at least 10% faster responses (the number differs from study to study), and I'm not certain excluding the text column to it's own table will give you that.

On the other hand, if the column would be indexed, that might make matters different. But as I see no index other than the primary key, I see no reason to refactor here - other than that such a column is considered bad practice.