SQL Server – How to Get First Login-Hit on Huge Table

sql server

I have table like this in my database :

ID (pk,int,not null)

UserID (int,null)

HitDate (datetime,null)

Every time a user has any transacation in my application one hit inserted in this table.

Now my client want to find out how many new users I have per day from the beginning of application.

As I find out I have to group user by "UserID" and get the first HitDate, then remove other hit dates for that user. The first one will be the first login into system. Then I count using this method of all user per day.

My problem is this table has a huge row number, near 1,800,000,000 rows !!!!

The only solution I find at the moment is to break it into few day and insert it into a temporary table, then remove duplicates from it.

This is my code : (ID is Autoincrement)

INSERT INTO  temptable
(UserID , HitDate ) 
Select vs.UserID ,
        vs.HitDate 
        From UserHitTbl vs where
 LEN(vs.UserID ) > 1 and 
vs.HitDate  > '2013-06-12 14:32:59.783' and 
vs.HitDate < '2013-07-12 14:32:59.783' 
  DELETE
FROM temptable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM t6
GROUP BY UserID )

The above gives the following error:

Could not allocate space for object '' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup

Best Answer

That error is simply telling you that the volume(s) you have tempdb on is/are full. Unless you have explicitly altered the file layout of tempdb it will all be on C:. You must be pretty low on space where tempdb is unless the amount of rows for that 24 hour period is massive.

Anyway, spooling large amounts of data into a temporary table onyl to delete most of them immediately is pretty much always the wrong way to go: with decent indexes on that table you should be able to extract the relevant data with a SELECT query like so:

SELECT user_id, MIN(action_date)
FROM   your_table
WHERE  action_date BETWEEN <start> AND <end>
GROUP BY user_id

The above will list every user who touched the system (so has entries in that table) between the dates/times specified along with the first time they touch the system in that time period.

It would be useful to list in your question what keys and indexes you currently have on that table. An index with action_date is going to be pretty much essential. A compound index that covers action_date and other columns involved may be even better, though will take more space to store, depending on the balance of data. How any unique users do you have and over how many days has this data been recorded? (again: add such details to your question so everyone sees them easily, not just in a comment added to this answer)

If you just want to list users who have never logged an action before this time, then you need something like:

SELECT user_id, action_date
FROM   (
       SELECT user_id, action_date=MIN(action_date)
       FROM   your_table
       WHERE  action_date BETWEEN <start> AND <end>
       GROUP BY user_id
       ) derived_table
WHERE NOT EXISTS (SELECT user_id FROM your_table WHERE action_date < <start> 
                  AND derived_table.user_id = your_table.user_id)

I'm using the original query as a derived table here to avoid the chance the the query planner will try to apply the NOT EXISTS check to every one of those 1,800,000,000 rows, instead this way is should definitely find the users who were active today and run the sub-query once for each of them once only (so the sub-query will run at most once per user). For this to work you are going to need an index covering user_id (one covering user_id, action_date would most likely be better still).

Of course if you don't have sufficient indexes already adding them is going to be a very I/O intensive operation as for each SQL Server will need to read all those 1.8 billion rows and write all the index rows to go with them.

Related Solutions

Sql-server – Can’t create indexes on really large table!

One thing you can try is to temporarily set the recovery model to Simple.

Alternatively, you can tell the create index command to use tempdb. Since tempdb is typically in the Simple recovery mode.

CREATE INDEX ...
WITH (SORT_IN_TEMPDB = ON);

SQL Server 2008 – Primary Filegroup is Full Solution

Follow these steps:

Identify how much space you want to add to the database storage allocation:
1. Open windows explorer
2. Right click on the disk drive that your database files exist on
3. Select properties
4. Check how much disk space is available and decide how much of this you want to allocate for the database
  (Suggestion: Leave at least 20% disk space free if you house the database files on the same disk as your OS {Sub-Suggestion: Don't do this! Rebuild/migrate your data to it's own disk; you're screwing yourself on I/O.} and leave at least 8% for a pure data disk; these numbers are estimates of what I think the actual percentage suggestions are.)
Update the storage allocation for the database.
1. Open SSMS
2. Click the "View" tab
3. Select "Object Explorer"
4. Expand the "Databases" folder
5. Right click the database your trying to bulk insert into
6. Select "Properties"
7. Click the "Files" list option from the "Select a page" area at the left of the properties window
8. Find the "Database files" row with the "Filegroup" as "PRIMARY"
9. Add whatever number of megabytes you want to add to the database allocation to the "Initial Size (MB)" number
10. Hit "OK"
  (You might also want to consider your "Autogrowth" values while you're here.)

You want to give your database as much storage allocation as you can afford to give it. If it runs out of space you'll receive this error without auto-grow on and if auto-grow is on you'll take a performance hit each time it has to auto-grow. If you are simply out of disk space then that is your answer and you need a bigger disk.

Best Answer

Related Solutions

Sql-server – Can’t create indexes on really large table!

SQL Server 2008 – Primary Filegroup is Full Solution

Related Question