MySQL – Select vs Select SUM Performance

MySQLperformanceperformance-testingselectsum

I need to check whether the surfer has alrady voted this can be done using

Method 1

Select record_num FROM table where etc etc.

i.e.

SELECT record_num
FROM content_votes_tmp 
WHERE up = 1
AND ip = INET_ATON('$_SERVER[REMOTE_ADDR]') 
AND content = $_POST[id]
AND UNIX_TIMESTAMP(datetime) > '$old_time'

Method 2

Select Sum(votes) FROM table where etc etc.

i.e.

SELECT SUM(up) as up_count
FROM content_votes_tmp 
WHERE ip = INET_ATON('$_SERVER[REMOTE_ADDR]') 
AND content = $_POST[id]
AND UNIX_TIMESTAMP(datetime) > '$old_time'

Using storage engine as MyISAM ,
Table has around 1 million rows,
ROW Format is static.

I am looking for query which is faster in terms of performance.

So which query will be faster ? this query will be fired every time someone clicks the vote up or vote down button.,


here is a table structure

CREATE TABLE IF NOT EXISTS `content_votes_tmp` (
  `up` int(11) NOT NULL DEFAULT '0',
  `down` int(11) NOT NULL DEFAULT '0',
  `ip` int(10) unsigned NOT NULL,
  `content` int(11) NOT NULL,
  `datetime` datetime NOT NULL,
  `is_updated` tinyint(2) NOT NULL DEFAULT '0',
  `record_num` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`record_num`)
  KEY `content` (`content`),
  KEY `datetime` (`datetime`),
  KEY `is_updated` (`is_updated`),
  KEY `ip` (`ip`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=25 ;

Best Answer

The 2 queries that you show are different in meaning.

The first one will select all upvotes for that ip and content in that timeframe. If you create the index:

mysql> ALTER TABLE content_votes_tmp ADD INDEX(up, ip, content, datetime, record_num);
Query OK, 753676 rows affected (3.85 sec)
Records: 753676  Duplicates: 0  Warnings: 0

You can get good performance results- the first 2 columns will be used and you will get the Using Index optimization:

mysql> EXPLAIN SELECT record_num 
               FROM content_votes_tmp  
               WHERE up = 1 AND 
                     ip = 1  AND 
                     content = 1 AND 
                     UNIX_TIMESTAMP(datetime) > @number\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: content_votes_tmp
         type: ref
possible_keys: up
          key: up
      key_len: 12
          ref: const,const,const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

We can do even better than that. Remove the UNIX_TIMESTAMP() function -compare dates instead of ints by transforming $old_time- and will be able to apply all conditions using the index (3 consts and 1 range):

mysql> EXPLAIN SELECT record_num 
               FROM content_votes_tmp  
               WHERE up = 1 AND 
                     ip = 1  AND 
                     content = 1 AND 
                     datetime > now()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: content_votes_tmp
         type: range
possible_keys: up
          key: up
      key_len: 17
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

Be careful if the logic of the application requires to know if a downvote was done, this query will not help you with that.

The second one will require a slightly different index:

mysql> ALTER TABLE content_votes_tmp ADD INDEX(ip, content, datetime, up);
Query OK, 753676 rows affected (5.95 sec)
Records: 753676  Duplicates: 0  Warnings: 0

In order to get a good query plan, also needing the "move the function to the other side of the operand":

mysql> EXPLAIN SELECT SUM(up) as up_count 
       FROM content_votes_tmp  
       WHERE ip = 1 AND 
             content = 1 AND 
             datetime > now()\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: content_votes_tmp
         type: range
possible_keys: ip
          key: ip
      key_len: 13
          ref: NULL
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

I do not like the range + SUM(), so I would prefer the first one, which is simpler (unless you intend to return lots of rows). But there is nothing on the query plan to support my fears (it has the covering index optimization too, and no filesort, so both can be very fast with the appropriate indexes- they take 0.00s with my fake 1M data).

Remember that the logic is slightly different, so be aware of that- if you allowed upvotes and downvotes, that could be a problem.