Currently my team develop additional web app to let customers analyze user statistics to theirs portals.
So we have a table with all logins. It contains about 100 millions of rows. One user could return to the portal during the day multiple times.
This table structure something like:
| UserId | LoginDate |
Also we have a hierarchy of customers, portals and countries, it looks like:
Users | Customers | Portals | Countries
FK-> CustomerId
FK -> PortalId
FK -> CountryId
Base on user activity we are building daily incremental snapshots to prevent using big login history table. And we store there Total Logins, Unique logins etc. This allows us show nice chart to the end users.
But now we are facing on interesting issue with this. Currently we need implement total Unique login for period. So the user select period, and see this total.
And this cause the story goes on. We can't just sum snapshots total Unique logins, it won't be a real unique. And we can't count each time from history table, because of performance. And I don't have idea how to cache totals.
Best Answer
How about an indexed view?
What this does is pre-aggregate the login data for you (you pay the cost at write time), and flattens it into one row per UserID per LoginDate. Depending on how many times a user logs in in a given day, and how often you run these reports, this can be substantially more efficient than calculating those at query time (and much less hassle than running your own nightly report that aggregates the data and stores it somewhere).
Now, to get the total login events by day:
To get the distinct logins by day:
Of course if you're looking for individual user activity, you can do that too, and you can easily aggregate the data to get whatever "total unique" really means to you. If UserID = 1 logs in today 4 times and tomorrow twice, is that 1 total unique (unique per entire time frame) or 2 total uniques (unique per day)?
If the former:
If the latter: