I have table like this in my database :
ID (pk,int,not null)
UserID (int,null)
HitDate (datetime,null)
Every time a user has any transacation in my application one hit inserted in this table.
Now my client want to find out how many new users I have per day from the beginning of application.
As I find out I have to group user by "UserID" and get the first HitDate
, then remove other hit dates for that user. The first one will be the first login into system. Then I count using this method of all user per day.
My problem is this table has a huge row number, near 1,800,000,000 rows !!!!
The only solution I find at the moment is to break it into few day and insert it into a temporary table, then remove duplicates from it.
This is my code : (ID is Autoincrement)
INSERT INTO temptable
(UserID , HitDate )
Select vs.UserID ,
vs.HitDate
From UserHitTbl vs where
LEN(vs.UserID ) > 1 and
vs.HitDate > '2013-06-12 14:32:59.783' and
vs.HitDate < '2013-07-12 14:32:59.783'
DELETE
FROM temptable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM t6
GROUP BY UserID )
The above gives the following error:
Could not allocate space for object '' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup
Best Answer
That error is simply telling you that the volume(s) you have tempdb on is/are full. Unless you have explicitly altered the file layout of tempdb it will all be on C:. You must be pretty low on space where tempdb is unless the amount of rows for that 24 hour period is massive.
Anyway, spooling large amounts of data into a temporary table onyl to delete most of them immediately is pretty much always the wrong way to go: with decent indexes on that table you should be able to extract the relevant data with a
SELECT
query like so:The above will list every user who touched the system (so has entries in that table) between the dates/times specified along with the first time they touch the system in that time period.
It would be useful to list in your question what keys and indexes you currently have on that table. An index with action_date is going to be pretty much essential. A compound index that covers action_date and other columns involved may be even better, though will take more space to store, depending on the balance of data. How any unique users do you have and over how many days has this data been recorded? (again: add such details to your question so everyone sees them easily, not just in a comment added to this answer)
If you just want to list users who have never logged an action before this time, then you need something like:
I'm using the original query as a derived table here to avoid the chance the the query planner will try to apply the
NOT EXISTS
check to every one of those 1,800,000,000 rows, instead this way is should definitely find the users who were active today and run the sub-query once for each of them once only (so the sub-query will run at most once per user). For this to work you are going to need an index coveringuser_id
(one coveringuser_id, action_date
would most likely be better still).Of course if you don't have sufficient indexes already adding them is going to be a very I/O intensive operation as for each SQL Server will need to read all those 1.8 billion rows and write all the index rows to go with them.