Sql-server – Retrieving data speed tweaks SQL Server 2005

performancequery-performancesql serversql-server-2005

I have a database in Microsoft SQL Server 2005.

I have table with 3 columns, namely HASHKEY (BIGINT), NOTE_ID (INT) and TIME_OFFSET (INT).

Columns HASHKEY has 19 digit BIGINT values, which can DUPLICATE. But values will be always of 19 digits.

I have application mainly depends on data in this table. Application retrieves data from this table with query like:

select HASHKEY, NOTE_ID, TIME_OFFSET 
from TABLE_NAME 
where HASHKEY in (<list of around 30000 hashkeys>)

But this query takes around 2 minutes to retrieve data. This is my problem. The application is real time processing application, and need to retrieve data in about 5 seconds. How can I tweak things in server, so time to retrieve data can be decreased?

I have indexed the table by HASHKEY column when I created table, but still retrieving data is taking much time.

Is there any setting which I can do in database to so time can be decreased? I will welcome any type of solution. But I need to solve this. I am not very expert in this.

Also HASHKEY is just random values of 19 digits, no relation with other values.

Result of following query is,

select count(HASHKEY) from TABLE_NAME
go
select count(distinct(HASHKEY)) from TABLE_NAME

Result:

225899932
189200251

Time taken: 2 minutes, 1 second

EDIT

This is script to create table:

USE [fp]
GO
/****** Object:  Table [dbo].[fp_core]    Script Date: 09/13/2013 22:40:43 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[fp_core](
    [hashkey] [bigint] NOT NULL,
    [note_id] [int] NOT NULL,
    [timeoffset] [int] NOT NULL
) ON [PRIMARY]

I will ask random set of around 30000 hashkeys for data, no any ordering.

When I was having around 20000000 rows in table, query was taking less than 2 seconds, but now retrieval time is increasing.

EDIT

And here is script to create index, only one index in this table.

USE [fp]
GO
/****** Object:  Index [IX_fp_core]    Script Date: 09/13/2013 23:04:24 ******/
CREATE NONCLUSTERED INDEX [IX_fp_core] ON [dbo].[fp_core] 
(
    [hashkey] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Please help me.

Best Answer

Your table is a "heap," which means that it doesn't have a clustered index. The short version of what that means is that your data isn't laid out on the disk in a logical way. A heap really isn't an optimal structure for an actively updated table. Here's an excellent video on heaps.

So, my suggestions--please test them before deploying them in production--are:

BEGIN TRANSACTION

USE fp; ALTER TABLE fp_core ADD ID bigint NOT NULL IDENTITY (1, 1)
GO

USE fp; CREATE CLUSTERED INDEX IX_fp ON fp_core (ID) WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

COMMIT

This will take some time to complete AND will be a blocking query, so do it during downtime.

This will give you a clustered index and lay out your table on a completely random sequential number, unfortunately, but since you don't have any non-unique values that might be the best course of action.

I would then include the columns note_id and timeoffset in your nonclustered index on HASHKEY.

BEGIN TRANSACTION

USE [fp]
GO

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[fp_core]') AND name = N'IX_Agency_TIMESTAMP')
DROP INDEX [IX_fp_core] ON [dbo].[fp_core] WITH ( ONLINE = OFF )
GO

USE [fp]
GO

CREATE NONCLUSTERED INDEX [IX_fp_core] ON [dbo].[fp_core] 
(
    [hashkey] ASC
)
INCLUDE ( [note_id],[timeoffset])
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 80) ON [PRIMARY]
GO

COMMIT

(Or you could do it through the GUI. If you right-click on the index and choose properties, there's an included columns pane.)

This will hopefully have the effect of replacing your heap scans with nonclustered index seeks, but since your query has a "where in [30,000 items], it might do a index scan instead. Either way, it won't be rooting around in the heap, which I expect will be a good thing.

Related Solutions

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Sql-server – SQL Server 2005 – Query optimization for fetching large number of rows from table with 750 million rows

If you are determined to retain your existing table structure, the best indexes will be something like:

create index ix1 on fp_core(hashkey, note_id, timeoffset);
create index ix1 on fp_core_keys(hashkey);

You mention that the hashkeys can be different for the same song, but it sounds like you have some mechanism for detecting similar songs - if possible, add an additional column called 'songid' to your fp_core table and then query on this single value using a WHERE clause.

Best Answer

Related Solutions

MySQL looking up more rows than needed (indexing issue)

Sql-server – SQL Server 2005 – Query optimization for fetching large number of rows from table with 750 million rows

Related Question