Sql-server – Does having an index on a VARCHAR column with a lot of very similar starting values have bad performance

indexsql serversql-server-2005sql-server-2008sql-server-2008-r2

We seem to be having quite unusual bad performance on queries that use an index. for example the table looks like

  • PK BIGINT
  • ID VARCHAR(50)
  • Col1
  • Col2
  • etc

So we need to insert a row in the database and later look it up on the ID. But the ID a third party has and we have the PK. We need to get the PK back. But a very large range of those IDs have very similar starting values. for example

  • "//45-423484834893457"
  • "//45-573459834589345"
  • "//45-345345345345345

I am not sure how SQL Server is traversing the BTree, if it hashing the values or doing string comparisons starting at the left most position.

Could having a very large range of very similar values (at least first 4 characters are identical) lead to a poorly performing index when querying for those values?

UPDATE:

Sorry the look up query is

SELECT PK_Column FROM table WHERE ID = @ID

Marks request:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 22 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 30 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.0032831" StatementText="SELECT&#xD;&#xA;      LocalMsgId&#xD;&#xA;    FROM&#xD;&#xA;      Pdu (nolock)&#xD;&#xA;  WHERE&#xD;&#xA;     RemoteMsgId = '41/00/2789aeb8/1127796335811'&#xD;&#xA;      &#xD;" StatementType="SELECT" ParameterizedText="(@1 varchar(8000))SELECT [LocalMsgId] FROM [Pdu](nolock) WHERE [RemoteMsgId]=@1" QueryHash="0x677C78E75E33C4C7" QueryPlanHash="0xB358D862A43E4853">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan CachedPlanSize="16" CompileTime="7406" CompileCPU="1970" CompileMemory="120">
            <RelOp AvgRowSize="23" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Index Seek" NodeId="0" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="5074270">
              <OutputList>
                <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
              </OutputList>
              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
                  </DefinedValue>
                </DefinedValues>
                <Object Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Index="[IX_Pdu_RemoteMsgId]" IndexKind="NonClustered" />
                <SeekPredicates>
                  <SeekPredicateNew>
                    <SeekKeys>
                      <Prefix ScanType="EQ">
                        <RangeColumns>
                          <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="RemoteMsgId" />
                        </RangeColumns>
                        <RangeExpressions>
                          <ScalarOperator ScalarString="[@1]">
                            <Identifier>
                              <ColumnReference Column="@1" />
                            </Identifier>
                          </ScalarOperator>
                        </RangeExpressions>
                      </Prefix>
                    </SeekKeys>
                  </SeekPredicateNew>
                </SeekPredicates>
              </IndexScan>
            </RelOp>
            <ParameterList>
              <ColumnReference Column="@1" ParameterCompiledValue="'41/00/2789aeb8/1127796335811'" />
            </ParameterList>
          </QueryPlan>
        </StmtSimple>
      </Statements>
      <Statements>
        <StmtSimple StatementCompId="2" StatementId="2" StatementText="&#xA;SET STATISTICS IO OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementId="1" StatementText="SET STATISTICS TIME OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>

Best Answer

It depends on the queries are you using. MS SQL Server use BTree indexes which are always balanced, but if you use query like this:

select * from table where field like 'some%'

and most of your records correspons to this condition, MS SQL Server can decide that it will be cheaper to use table scan instead of index scan or index lookup.

Addition: Anyway you can use computed columns to reverse your field value and create the index on it.