Mysql – counter table in thesql

database-designMySQL

i was going through the book: High performance MySql 2nd edition. In chapter 4 it says:

An application that keeps counts in a table can run into concurrency
problems when updating the counters. Such tables are very common in
web applications.

To keep things as simple as possible, suppose you have a counter table
with a single row that just counts hits on your website:
mysql> CREATE TABLE hit_counter ( cnt int unsigned not null ) ENGINE=InnoDB;
Each hit on the website updates the counter:
mysql> UPDATE hit_counter SET cnt = cnt + 1;
The problem is that this single row is effectively a global “mutex”
for any transaction that updates the counter. It will serialize those
transactions. You can get higher concurrency by keeping more than one
row and updating a random row.

Below is the change:
mysql> CREATE TABLE (slot tinyint unsigned not null primary key, cnt int unsigned not null ) ENGINE=InnoDB;
Prepopulate the table by adding 100 rows to it. Now the query can just
choose a random slot and update it:

mysql> UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND() * 100;

To retrieve statistics, just use aggregate queries:
mysql> SELECT SUM(cnt) FROM hit_counter;

Here comes my doubt:

As there are two different queries one which increment the cnt and other which calculate the sum of cnt than how come it will give correct result to two different threads executing simultaneously? For example t1 fires the first query to increase the cnt , before it fetches the sum t2 also updates the cnt than the 2nd query for t1 will fetch wrong result. Please clarify.

if you dont have the book please refer this.

Best Answer

It's the transaction isolation that guarantees consistent results.

In your example, if you are concerned about possible changes from other threads between the first and the second query, you should enclose your queries in a single transaction: this is implicit if you have autocommit disabled, otherwise can be explicitly declared:

START TRANSACTION;
UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND() * 100;
SELECT SUM(cnt) FROM hit_counter;
COMMIT;

But it must also be said that, in many real use cases (like a counter for visits on a web page), having a result which is consistent with the previous increment is not so important.

For a deep explanation about possible transaction isolation levels please refer to: http://dev.mysql.com/doc/refman/5.7/en/set-transaction.html

Related Solutions

Mysql – Finding rows for a specified date range

Another way to get the result is this. It finds first all groups that the teacher has surely taught (or is going to) by checking that she has started within the month and then in another subquery it finds - for every group - the last teacher that started at the first day of the month or earlier.

With the unique index you have on the table, the second subquery should be quite efficient. The first subquery would benefit from an index on (teacherid, startdate, groupid):

SELECT groupid
FROM pupilgroupteacher
WHERE teacherid = @teacher 
  AND startdate >= @month + INTERVAL 1 DAY
  AND startdate < @month + INTERVAL 1 MONTH

UNION DISTINCT

SELECT gg.groupid
FROM 
    ( SELECT DISTINCT groupid
      FROM pupilgroupteacher
    ) AS gd
  JOIN pupilgroupteacher AS gg
    ON  gg.groupid = gd.groupid
    AND gg.startdate =
        ( SELECT MAX(gi.startdate)
          FROM pupilgroupteacher AS gi 
          WHERE gi.groupid = gd.groupid
            AND gi.startdate < @month + INTERVAL 1 DAY
        )
WHERE gg.teacherid = @teacher ;

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Best Answer

Related Solutions

Mysql – Finding rows for a specified date range

MySQL looking up more rows than needed (indexing issue)

Related Question