Sql-server – Query to search for a substring in xml

performancequery-performancesql-server-2008-r2xmlxquery

I use below query to search for a substring in whole xml(including node name and node value)

SELECT * 
FROM tablename
WHERE ( Charindex('abc',CAST([xmlcolumn] AS VARCHAR(MAX)))>0 ) 

I want an alternative query which has good performance than this. So please suggest some.
Below are details:
Table:

CREATE TABLE [dbo].[tablename](
    [Sl_no] [int] NOT NULL,
    [Date] [date] NULL,
    [Operation] [nvarchar](max) NULL,
    [Allot] [nvarchar](50) NULL,
    **[xmlcolumn]** [xml] NULL,
    [By] [nvarchar](255) NULL,
    [Dept] [nvarchar](255) NULL,
    [Db] [varchar](255) NULL,
    [tabl] [varchar](255) NULL,
    [Remark] [varchar](5000) NULL,
    [Work] [int] NULL,
    [F2] [nvarchar](max) NULL,
    [F6] [nvarchar](max) NULL,
    [F5] [nvarchar](max) NULL,
    [F8] [nvarchar](max) NULL,
    [ListC] [nvarchar](255) NULL,
    [pro] [nvarchar](max) NULL,
    [Completed] [varchar](50) NULL,
    [WorkTime] [xml] NULL,
    [RelatedData] [varchar](255) NULL,
    [User] [xml] NULL,
    [TeBy] [xml] NULL,
    [Date1] [nvarchar](50) NULL,
    [Num1] [nvarchar](50) NULL,
 CONSTRAINT [PK_DBChanges] PRIMARY KEY CLUSTERED 
(
    [Sl_no] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Number of rows: 3000
Sample xml in a column:(number of nodes may change little bit.

<Root>
  <Row>
    <User>abs</User>
    <Rowid>1</Rowid>
  </Row>
  <Row>
    <User>xra</User>
    <Rowid>2</Rowid>
  </Row>
  <Maxrowid>2</Maxrowid>
</Root>

Type of search expected:To search for a substring within xml column 'xmlcolumn' and then return the full row containing it.

So the query i used is

SELECT * 
FROM tablename
WHERE ( Charindex('abc',CAST([xmlcolumn] AS VARCHAR(MAX)))>0 ) 

The IO for the query by using set statistics time on

SQL Server Execution Times:
   CPU time = 62 ms,  elapsed time = 93 ms.

I tried with another query(But it will search only in the node value)

select *
from tablename
where xmlcolumn.exist('//*/text()[contains(., "abc")]') = 1

and statistics output was

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 109 ms.

Best Answer

To know what performance you will have you have to test on your data. I obviously can't do that so I made up my own xml data to test the two queries you have in this question.

Create a table with 5000 rows containing an XML document of 9475 characters in 415 nodes:

create table T
(
  ID int identity primary key,
  XMLCol xml not null
)

declare @X xml = 
(
  select top 100 *
  from master..spt_values
  for xml path('row'), root('root'), type
)

insert into T(XMLCol)
select top(5000) @X
from master..spt_values as m1, master..spt_values as m2

Execute the queries to search for a value that is present in the first node (rpc) and another value that is present in the last node (SERVER ROLE).

select count(*)
from T
where charindex('rpc',cast(xmlcol as varchar(max))) > 0

select count(*)
from T
where XMLCol.exist('//*/text()[contains(., "rpc")]') = 1

select count(*)
from T
where charindex('SERVER ROLE',cast(xmlcol as varchar(max))) > 0

select count(*)
from T
where XMLCol.exist('//*/text()[contains(., "SERVER ROLE")]') = 1

The IO for the different queries is the same so here is the output from using set statistics time on

Search for rpc with charindex:

 SQL Server Execution Times:
   CPU time = 1435 ms,  elapsed time = 1434 ms.

Search for rpc with xml exist

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 68 ms.

Search for SERVER ROLE with charindex

 SQL Server Execution Times:
   CPU time = 7316 ms,  elapsed time = 7321 ms.

Search for SERVER ROLE with xml exist

 SQL Server Execution Times:
   CPU time = 3245 ms,  elapsed time = 3244 ms.

Clear winner in both cases is the XML query. It does a better job of scanning the entire XML and it does a much better job of early termination when the search string is found.

This is true for the test data above using SQL Server 2012. It could be different for you with your data and your search strings. You have to test to know what is best for you.

NOTE: As stated in the answer to your other question, the two queries above will not return the same result bucause the XML query only search node values where the charindex query searches the entire XML document including nodenames and markup.