Sql-server – How to optimise this query

optimizationperformancequery-performancesql server

Script to create Tables Person and Company

USE optimisationTesting

CREATE TABLE Person(Id int not null, Name nvarchar(60), StreetAddress nvarchar(60)  
                  , Height int, CompanyId int)
CREATE TABLE Company(Id int not null, Name nvarchar(60))
GO

ALTER TABLE [dbo].[Person]  ADD CONSTRAINT [PK_Person]  PRIMARY KEY CLUSTERED ([Id] ASC)
ALTER TABLE [dbo].[Company] ADD CONSTRAINT [PK_Company] PRIMARY KEY CLUSTERED ([Id] ASC)
GO

DECLARE @i int = 0

WHILE @i < 100000 BEGIN

    INSERT INTO Person(Id, Name, StreetAddress, Height, CompanyId) 
    VALUES (
        @i, 
        CONVERT(nvarchar(60), newid()), 
        CONVERT(nvarchar(60), newid()), 
        140+rand()*60, 
        rand()*100) 

    SET @i = @i + 1
END

SET @i=0
WHILE @i < 100 BEGIN

    INSERT INTO Company(Id, Name) values (@i, convert(nvarchar(60), newid()))

    SET @i = @i + 1
END

-------------------------------------------------------------------------------

set statistics time on

select *
    from 
    Person p 
    where p.Name like 'a%'

set statistics time off

The average elapsed time after three execution was 363ms.
How can this query be further optimized to find out the names of people that starts with A?

Best Answer

Misread your post and wrote up a post telling you how to optimize the inserts, which seemed like an odd activity to do. The real query to optimize is this:

select *
from 
Person p 
where p.Name like 'a%';

That query gets back around 6100 rows out of 100000 rows for me. Even if you create a simple index on the name column it is unlikely that the query optimizer will use it.

create index Person__name ON Person (name);

select *
from 
Person p  WITH (INDEX (Person__name)) 
where p.Name like 'a%';

Disclaimer that I'm only using that hint for testing purposes. You don't want to use something like that in production unless you really know what you're doing. That hint forces the optimizer to use the index but I get worse results than not using it. If you truly need that query to be faster my only ideas are that you'll need to create a covering index, you'll need to change the structure of the table, or maybe there are some small gains to be had by writing the filter condition in a different way.

Here is what I mean by a covering index:

create index Person__name ON Person (name)
INCLUDE (StreetAddress, Height, CompanyId)

That index avoids the key lookups that you get with a simple index because every necessary column is included in it. However, the size of that index will be similar in size to the table. Any DML on the table will be slower because the index needs to be maintained.

You could change the clustering index from ID to name. That would have significant impacts on DML performance and on other queries, but it would make your query faster.

Perhaps instead of LIKE there is another string operator that you can use. I doubt that such changes would have a significant impact on your query's runtime.

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

You write:

Each customer can have multiple sites, but only one should be displayed in this list.

Yet, your query retrieves all rows. That would be a point to optimize. But you also do not define which site is to be picked.

Either way, it does not matter much here. Your EXPLAIN shows only 5026 rows for the site scan (5018 for the customer scan). So hardly any customer actually has more than one site. Did you ANALYZE your tables before running EXPLAIN?

From the numbers I see in your EXPLAIN, indexes will give you nothing for this query. Sequential table scans will be the fastest possible way. Half a second is rather slow for 5000 rows, though. Maybe your database needs some general performance tuning?

Maybe the query itself is faster, but "half a second" includes network transfer? EXPLAIN ANALYZE would tell us more.

If this query is your bottleneck, I would suggest you implement a materialized view.

After you provided more information I find that my diagnosis pretty much holds.

The query itself needs 27 ms. Not much of a problem there. "Half a second" was the kind of misunderstanding I had suspected. The slow part is the network transfer (plus ssh encoding / decoding, possibly rendering). You should only retrieve 100 rows, that would solve most of it, even if it means to execute the whole query every time.

If you go the route with a materialized view like I proposed you could add a serial number without gaps to the table plus index on it - by adding a column row_number() OVER (<your sort citeria here>) AS mv_id.

Then you can query:

SELECT *
FROM   materialized_view
WHERE  mv_id >= 2700
AND    mv_id <  2800;

This will perform very fast. LIMIT / OFFSET cannot compete, that needs to compute the whole table before it can sort and pick 100 rows.

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

Sql-server – How to Optimise Query

The optimizer can consider many plan alternatives (including ones with multiple seeks) but for disjunctions (OR predicates) it does not consider plans involving index intersections by default. Given the indexes:

CREATE CLUSTERED INDEX cx 
ON dbo.DispatchLink (DispatchLink1, ContractLink1);

CREATE NONCLUSTERED INDEX nc1 
ON dbo.DispatchLink (DispatchLink2, ContractLink2);

We can force index seeks (assuming SQL Server 2008 or later):

SELECT * 
FROM dbo.Dispatch AS d
INNER JOIN dbo.DispatchLink AS dl WITH (FORCESEEK) ON 
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

FORCESEEK plan

Using your sample data, the seek plan costs at 0.0332551 units compared with 0.0068057 for the scan plan:

Scan plan

There are all sorts of possible query rewrites and hints we can try. One example of a rewrite to promote an option the optimizer does not consider for the original plan is:

SELECT * 
FROM dbo.Dispatch AS d
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.DispatchLink AS dl
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;

This execution plan does not seek the second index if it finds a match on the first:

APPLY TOP Plan

This may perform very slightly better than the default FORCESEEK plan.

Without adding any new indexes, we can also force a seek into the Dispatch table:

SELECT * 
FROM dbo.DispatchLink AS dl
JOIN dbo.Dispatch AS d WITH (FORCESEEK) ON
    (d.DispatchId = dl.DispatchLink1 AND d.ContractId = dl.ContractLink1)
    OR (d.DispatchId = dl.DispatchLink2 AND d.ContractId = dl.ContractLink2);

Seek 2

This may be better or worse than the first example depending on things like how many rows are in each of the tables. The APPLY + TOP improvement is still possible:

SELECT * 
FROM dbo.DispatchLink AS dl
CROSS APPLY
(
    SELECT TOP (1) * FROM
    (
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink1 = d.DispatchId
        AND dl.ContractLink1 = d.ContractId
        UNION ALL
        SELECT * FROM dbo.Dispatch AS d
        WHERE dl.DispatchLink2 = d.DispatchId
        AND dl.ContractLink2 = d.ContractId
    ) SQ1
) AS F1;

Best Answer

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

Sql-server – How to Optimise Query

Related Question