Sql-server – Cache statistic data in SQL Server

cacheperformancesql server

Currently my team develop additional web app to let customers analyze user statistics to theirs portals.

So we have a table with all logins. It contains about 100 millions of rows. One user could return to the portal during the day multiple times.

This table structure something like:

| UserId | LoginDate |

Also we have a hierarchy of customers, portals and countries, it looks like:

Users           | Customers      | Portals         | Countries
FK-> CustomerId   
                  FK -> PortalId   
                                   FK -> CountryId   

Base on user activity we are building daily incremental snapshots to prevent using big login history table. And we store there Total Logins, Unique logins etc. This allows us show nice chart to the end users.

But now we are facing on interesting issue with this. Currently we need implement total Unique login for period. So the user select period, and see this total.

And this cause the story goes on. We can't just sum snapshots total Unique logins, it won't be a real unique. And we can't count each time from history table, because of performance. And I don't have idea how to cache totals.

Best Answer

How about an indexed view?

CREATE VIEW dbo.LoginsByDate
WITH SCHEMABINDING
AS
  SELECT 
    UserID, 
    LoginDate = CONVERT(DATE, LoginDate), 
    LoginCount = COUNT_BIG(*)
  FROM dbo.LoginTable
  GROUP BY UserID, CONVERT(DATE, LoginDate);
GO
CREATE UNIQUE CLUSTERED INDEX IX_LoginsByDate
  ON dbo.LoginsByDate(UserID, LoginDate);
GO

What this does is pre-aggregate the login data for you (you pay the cost at write time), and flattens it into one row per UserID per LoginDate. Depending on how many times a user logs in in a given day, and how often you run these reports, this can be substantially more efficient than calculating those at query time (and much less hassle than running your own nightly report that aggregates the data and stores it somewhere).

Now, to get the total login events by day:

SELECT LoginDate, SUM(LoginCount)
  FROM dbo.LoginsByDate
  GROUP BY LoginDate;

To get the distinct logins by day:

SELECT LoginDate, COUNT(UserID)
  FROM dbo.LoginsByDate
  GROUP BY LoginDate;

Of course if you're looking for individual user activity, you can do that too, and you can easily aggregate the data to get whatever "total unique" really means to you. If UserID = 1 logs in today 4 times and tomorrow twice, is that 1 total unique (unique per entire time frame) or 2 total uniques (unique per day)?

If the former:

SELECT COUNT(DISTINCT UserID)
  FROM dbo.LoginsByDate
  WHERE LoginDate >= 'yyyymmdd'
    AND LoginDate < 'yyyymmdd';

If the latter:

SELECT SUM(c) FROM
(
  SELECT LoginDate, c = COUNT(UserID)
  FROM dbo.LoginsByDate
  WHERE LoginDate >= 'yyyymmdd' 
    AND LoginDate < 'yyyymmdd'
  GROUP BY LoginDate
) AS x;