Sql-server – Does having an index on a VARCHAR column with a lot of very similar starting values have bad performance

indexsql serversql-server-2005sql-server-2008sql-server-2008-r2

We seem to be having quite unusual bad performance on queries that use an index. for example the table looks like

PK BIGINT
ID VARCHAR(50)
Col1
Col2
etc

So we need to insert a row in the database and later look it up on the ID. But the ID a third party has and we have the PK. We need to get the PK back. But a very large range of those IDs have very similar starting values. for example

"//45-423484834893457"
"//45-573459834589345"
"//45-345345345345345

I am not sure how SQL Server is traversing the BTree, if it hashing the values or doing string comparisons starting at the left most position.

Could having a very large range of very similar values (at least first 4 characters are identical) lead to a poorly performing index when querying for those values?

UPDATE:

Sorry the look up query is

SELECT PK_Column FROM table WHERE ID = @ID

Marks request:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 22 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 30 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.0032831" StatementText="SELECT&#xD;&#xA;      LocalMsgId&#xD;&#xA;    FROM&#xD;&#xA;      Pdu (nolock)&#xD;&#xA;  WHERE&#xD;&#xA;     RemoteMsgId = '41/00/2789aeb8/1127796335811'&#xD;&#xA;      &#xD;" StatementType="SELECT" ParameterizedText="(@1 varchar(8000))SELECT [LocalMsgId] FROM [Pdu](nolock) WHERE [RemoteMsgId]=@1" QueryHash="0x677C78E75E33C4C7" QueryPlanHash="0xB358D862A43E4853">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan CachedPlanSize="16" CompileTime="7406" CompileCPU="1970" CompileMemory="120">
            <RelOp AvgRowSize="23" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Index Seek" NodeId="0" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="5074270">
              <OutputList>
                <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
              </OutputList>
              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
                  </DefinedValue>
                </DefinedValues>
                <Object Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Index="[IX_Pdu_RemoteMsgId]" IndexKind="NonClustered" />
                <SeekPredicates>
                  <SeekPredicateNew>
                    <SeekKeys>
                      <Prefix ScanType="EQ">
                        <RangeColumns>
                          <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="RemoteMsgId" />
                        </RangeColumns>
                        <RangeExpressions>
                          <ScalarOperator ScalarString="[@1]">
                            <Identifier>
                              <ColumnReference Column="@1" />
                            </Identifier>
                          </ScalarOperator>
                        </RangeExpressions>
                      </Prefix>
                    </SeekKeys>
                  </SeekPredicateNew>
                </SeekPredicates>
              </IndexScan>
            </RelOp>
            <ParameterList>
              <ColumnReference Column="@1" ParameterCompiledValue="'41/00/2789aeb8/1127796335811'" />
            </ParameterList>
          </QueryPlan>
        </StmtSimple>
      </Statements>
      <Statements>
        <StmtSimple StatementCompId="2" StatementId="2" StatementText="&#xA;SET STATISTICS IO OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementId="1" StatementText="SET STATISTICS TIME OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>

Best Answer

It depends on the queries are you using. MS SQL Server use BTree indexes which are always balanced, but if you use query like this:

select * from table where field like 'some%'

and most of your records correspons to this condition, MS SQL Server can decide that it will be cheaper to use table scan instead of index scan or index lookup.

Addition: Anyway you can use computed columns to reverse your field value and create the index on it.

Related Solutions

Sql-server – SQL Server – Logical Reads lowered, Execution time remained the same

some 7 seconds for 500k to return, and also a lot of time to render the grid you're likely displaying the results in.

You are waiting 7 seconds because that's how much it takes for SQL to push 500k rows to your client. Look at client statistics in your SSMS, see Database Engine Query Editor:

Include Client Statistics: Includes a Client Statistics window that contains statistics about the query and about the network packets, and the elapsed time of the query.

SQL Server Execution Times: CPU time = 47 ms, elapsed time = 769 ms.

Your actual query executes in 47ms. Elapsed time is much longer (still under 1 second) because of network waits. You can confirm this using wait stats analysis, read How to analyse SQL Server performance for details, including how to capture the query wait stats.

Ultimately, the problem is returning 500k rows to the client. There cannot be any reason for such operation, no human user can comprehend half a million rows. Process data on the back end.

Half a million rows is used since it is easier to notice speed changes with more than less data

Well, in this case you have sent yourself on a snipe hunt. There is no problem, other that one of your own doing in marshaling and rendering 500k rows. Is a completely bogus scenario, no app should retrieve 500k rows. And processing of 500k rows(eg. aggregates) should tests... the processing, including the aggregates.

Sql-server – disable execution plan caching for debugging purposes

According to the docs there are three ways to recompile a stored procedure and/or the queries inside of a stored procedure.

WITH RECOMPILE

This is probably the most direct way to force a recompile, if you are willing to edit the definition of the stored procedure then you can automatically force a recompile every time. There are some significant disadvantages to watch out for though.

CREATE PROCEDURE dbo.FooForProfit @Blood INT, @Sweat BIGINT, @Tears SMALLINT
  WITH RECOMPILE
AS
  -- Do that foo-do that you do so well (for profit)!

Alternatively you can just add the hint after the execute thusly:

EXECUTE dbo.FooForFun @Joy = 42 WITH RECOMPILE;

The problems:

This will hurt performance, and most likely give you tainted performance values.
Adding WITH RECOMPILE after the EXECUTE statement might not be possible depending on how you're executing the stored procedures.
The compiled plan will not be cached, nor will the performance information be maintained¹.

The compiled plan for the stored procedure is not cached. As a result, no performance information is maintained in DMVs such as sys.dm_exec_query_stats.
This will disable the Parameter Embedding Optimization¹.

Parameter Embedding Optimization takes this process a step further: query parameters are replaced with literal constant values during query parsing. The parser is capable of surprisingly complex simplifications, and subsequent query optimization may refine things even further.

I feel it is important to note that some very smart and experienced people like Aaron Bertrand make it a point not to go down this route. I initially thought this would be your best bet I'm now convinced that this probably isn't the ideal solution.

SP_RECOMPILE

If you can't or don't want to modify the stored procedure and your data access mechanism doesn't let you append the query hint you can manually force a recompile of the stored procedure using another provided stored procedure. One nice up side to this route is you can force a recompile at will so you can easily compare executions with or without the compilation penalty.

EXEC sp_recompile N'dbo.FooForFame';

The problems:

This is going to add a separate step to your process to reset your test bed.
~~You could probably add this to a trigger linked to executing your stored procedure.~~ <-- Don't do that because you don't want to encourage the maniacs or create more.

RECOMPILE Query Hint

Like it says in the docs, this will give you more fine grained control over what gets recompiled if a stored procedure has multiple queries.

RECOMPILE is a useful alternative to creating a stored procedure that uses the WITH RECOMPILE clause when only a subset of queries inside the stored procedure, instead of the whole stored procedure, must be recompiled. RECOMPILE is also useful when you create plan guides.

It will also let you recompile statements that aren't within a stored procedure. That is useful for obvious reasons. Here is a sample:

SELECT
  Peace 
FROM dbo.TheWorld
OPTION (RECOMPILE);

The problems:

This will be slightly more work if you really do have multiple statements in a stored procedure.

Best Answer

Related Solutions

Sql-server – SQL Server – Logical Reads lowered, Execution time remained the same

Sql-server – disable execution plan caching for debugging purposes

Related Question