I have a table where I store visits (id, datetime, country, browser). I want to count how many visits are from which country.
I use this query:
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country
But the dataset is huge. So if I add LIMIT 10
to only get the top 10 countries will it improve the performance?
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country LIMIT 10
Also I don't need exact results I only need some approximate results. So can't I make the query so it skips every 2nd result. Will it improve the performance? Are there any other methods?
Best Answer
Short answer: No.
Detailed answer: Unless you add/already have an index there that covers the
country
column the database engine has no way of telling the total number of instances to count. Even if you limit the output the entire table has to be scanned to retrieve every instance first. Of course, since this is a logging table, with potentially a large dataset, I'd avoid adding indexes to the table so:INSERT
timeSolution: create a second table that will keep a running total for all countries, that gets updated by a trigger on the
hits_table
or directly by the application that inserts the data intohits_table
. Something in the lines of:This would make the table very small and allow you to create a
PRIMARY KEY
on thecountry
anddate
columns, dedicating minimum resources. Once that key is set up, you could employ a trivial upsert statement (see this SQL Fiddle session for a full test case with a trigger-based usage):Query blocking note: In a heavy traffic scenario you ideally want to query
country_hits_totals
with non-blocking statements only. Otherwise the upsert on the trigger gets delayed, and by extensions entire transaction that kicked it off. In a perfect worldcountry_hits_totals
would get copied over to a data warehouse at the time of least daily traffic, and you'd run your BI queries on that warehouse. That said, this is large traffic/enterprise territory and possible solutions depend on how quickly you want to get the data. Say, if you need an update only once a day you're better off pre-aggregating data ontocountry_hits_totals
on a schedule instead of relying on a trigger updating the rows live.