I have a database in Microsoft SQL Server 2005.
I have table with 3 columns, namely HASHKEY (BIGINT)
, NOTE_ID (INT)
and TIME_OFFSET (INT)
.
Columns HASHKEY
has 19 digit BIGINT
values, which can DUPLICATE. But values will be always of 19 digits.
I have application mainly depends on data in this table. Application retrieves data from this table with query like:
select HASHKEY, NOTE_ID, TIME_OFFSET
from TABLE_NAME
where HASHKEY in (<list of around 30000 hashkeys>)
But this query takes around 2 minutes to retrieve data. This is my problem. The application is real time processing application, and need to retrieve data in about 5 seconds. How can I tweak things in server, so time to retrieve data can be decreased?
I have indexed the table by HASHKEY column when I created table, but still retrieving data is taking much time.
Is there any setting which I can do in database to so time can be decreased? I will welcome any type of solution. But I need to solve this. I am not very expert in this.
Also HASHKEY is just random values of 19 digits, no relation with other values.
Result of following query is,
select count(HASHKEY) from TABLE_NAME
go
select count(distinct(HASHKEY)) from TABLE_NAME
Result:
225899932
189200251
Time taken: 2 minutes, 1 second
EDIT
This is script to create table:
USE [fp]
GO
/****** Object: Table [dbo].[fp_core] Script Date: 09/13/2013 22:40:43 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[fp_core](
[hashkey] [bigint] NOT NULL,
[note_id] [int] NOT NULL,
[timeoffset] [int] NOT NULL
) ON [PRIMARY]
I will ask random set of around 30000 hashkeys for data, no any ordering.
When I was having around 20000000 rows in table, query was taking less than 2 seconds, but now retrieval time is increasing.
EDIT
And here is script to create index, only one index in this table.
USE [fp]
GO
/****** Object: Index [IX_fp_core] Script Date: 09/13/2013 23:04:24 ******/
CREATE NONCLUSTERED INDEX [IX_fp_core] ON [dbo].[fp_core]
(
[hashkey] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
Please help me.
Best Answer
Your table is a "heap," which means that it doesn't have a clustered index. The short version of what that means is that your data isn't laid out on the disk in a logical way. A heap really isn't an optimal structure for an actively updated table. Here's an excellent video on heaps.
So, my suggestions--please test them before deploying them in production--are:
This will take some time to complete AND will be a blocking query, so do it during downtime.
This will give you a clustered index and lay out your table on a completely random sequential number, unfortunately, but since you don't have any non-unique values that might be the best course of action.
I would then include the columns note_id and timeoffset in your nonclustered index on HASHKEY.
(Or you could do it through the GUI. If you right-click on the index and choose properties, there's an included columns pane.)
This will hopefully have the effect of replacing your heap scans with nonclustered index seeks, but since your query has a "where in [30,000 items], it might do a index scan instead. Either way, it won't be rooting around in the heap, which I expect will be a good thing.