Sql-server – Item Index for all items in nested XML query

sql serversql-server-2008-r2xml

I have a requirement to build an XML package with a parent record, and 1..N related child records.

The child records are to be a nodes under the parent node, and there may be one child record or several for each parent. It's also possible to have multiple parents in the same query.

The above query I have with no issue.

I also have a requirement to give each record (parent and child) a unique 0-based index. I can do with with ROW_NUMBER for each subquery, but can't figure out a way to get a unique index for every record in the entire query.

Fiddle here with sample data and working query.

(Note, SQLFiddle does not display XML in a very easy to read format so you may want to run this in your local SSMS.)

Working SSMS code below:

BEGIN TRY
    DROP TABLE #Child
    DROP TABLE #Parent
END TRY
BEGIN CATCH
END CATCH

CREATE TABLE #Parent 
    (RecId int PRIMARY KEY NOT NULL, 
     PersonName varchar(100), Age int)
CREATE TABLE #Child 
    (ChildID int identity PRIMARY KEY NOT NULL,
     ParentRecId int FOREIGN KEY REFERENCES #Parent(RecId), 
     SalesAmt money)

INSERT INTO #Parent
  (RecID, PersonName, Age)
VALUES
  (1, 'Aaron Bertrand', 99),
  (2, 'Paul White', 20),
  (3, 'JNK', 33)

INSERT INTO #Child
  (ParentRecID, SalesAmt)
VALUES
  (1, 10.00),
  (1, 20.00),
  (2, 15.15),
  (2, 100.00),
  (3, 0.00)

SELECT
  RecId as 'RID',
  PersonName as 'PNAM',
  Age,
  (
    SELECT 
      C.SalesAmt as 'SAMT',
      (ROW_NUMBER() OVER (ORDER BY ParentRecId) - 1) as 'Index'
    FROM
      #Child C
    WHERE
      C.ParentRecId = P.RecId
    FOR XML PATH ('ChildRec'), ROOT ('ChildRecs'), TYPE
  )
FROM
  #Parent P
FOR XML PATH ('Parent'), ROOT ('Parents'), TYPE

How can I get a unique index across all parent and child records for this XML package?

Best Answer

I found the answer - I needed to create a CTE that uses a union of all my child and parent records and creates a ROW_NUMBER(), then JOIN to that CTE to get the ROW_NUMBER() value which will be unique across all records.

Solution fiddle here.

Full solution to paste into SSMS:

BEGIN TRY
    DROP TABLE #Child
    DROP TABLE #Parent
END TRY
BEGIN CATCH
END CATCH

CREATE TABLE #Parent 
    (RecId int PRIMARY KEY NOT NULL, 
     PersonName varchar(100), Age int)
CREATE TABLE #Child 
    (ChildID int identity PRIMARY KEY NOT NULL,
     ParentRecId int FOREIGN KEY REFERENCES #Parent(RecId), 
     SalesAmt money)

INSERT INTO #Parent
  (RecID, PersonName, Age)
VALUES
  (1, 'Aaron Bertrand', 99),
  (2, 'Paul White', 20),
  (3, 'JNK', 33)

INSERT INTO #Child
  (ParentRecID, SalesAmt)
VALUES
  (1, 10.00),
  (1, 20.00),
  (2, 15.15),
  (2, 100.00),
  (3, 0.00)

;WITH IDs AS
(
    SELECT
       RN = (ROW_NUMBER() OVER (ORDER BY  RecId,CASE WHEN ChildId IS NULL THEN 0 ELSE 1 END) -1),
       RecId,
       ChildId
    FROM
       (
       SELECT
          RecId, ChildId = NULL
       FROM
          #Parent
       UNION ALL
       SELECT
          RecId, ChildId
       FROM 
          #Parent P
       INNER JOIN
          #Child C
              ON C.ParentRecId = P.RecId) x
)

SELECT
  P.RecId as 'RID',
  P.PersonName as 'PNAM',
  P.Age,
  I.RN as 'Index',
  (
    SELECT 
      C.SalesAmt as 'SAMT',
      I.RN as 'Index'
    FROM
      #Child C
    INNER JOIN
       IDs I
          ON I.ChildId = C.ChildID
    WHERE
      C.ParentRecId = P.RecId
    FOR XML PATH ('ChildRec'), ROOT ('ChildRecs'), TYPE
  )
FROM
  #Parent P
INNER JOIN
    IDs I
       ON I.RecId = P.RecId
       AND I.ChildId IS NULL
FOR XML PATH ('Parent'), ROOT ('Parents'), TYPE

Related Solutions

Sql-server – Query to search for a substring in xml

To know what performance you will have you have to test on your data. I obviously can't do that so I made up my own xml data to test the two queries you have in this question.

Create a table with 5000 rows containing an XML document of 9475 characters in 415 nodes:

create table T
(
  ID int identity primary key,
  XMLCol xml not null
)

declare @X xml = 
(
  select top 100 *
  from master..spt_values
  for xml path('row'), root('root'), type
)

insert into T(XMLCol)
select top(5000) @X
from master..spt_values as m1, master..spt_values as m2

Execute the queries to search for a value that is present in the first node (rpc) and another value that is present in the last node (SERVER ROLE).

select count(*)
from T
where charindex('rpc',cast(xmlcol as varchar(max))) > 0

select count(*)
from T
where XMLCol.exist('//*/text()[contains(., "rpc")]') = 1

select count(*)
from T
where charindex('SERVER ROLE',cast(xmlcol as varchar(max))) > 0

select count(*)
from T
where XMLCol.exist('//*/text()[contains(., "SERVER ROLE")]') = 1

The IO for the different queries is the same so here is the output from using set statistics time on

Search for rpc with charindex:

 SQL Server Execution Times:
   CPU time = 1435 ms,  elapsed time = 1434 ms.

Search for rpc with xml exist

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 68 ms.

Search for SERVER ROLE with charindex

 SQL Server Execution Times:
   CPU time = 7316 ms,  elapsed time = 7321 ms.

Search for SERVER ROLE with xml exist

 SQL Server Execution Times:
   CPU time = 3245 ms,  elapsed time = 3244 ms.

Clear winner in both cases is the XML query. It does a better job of scanning the entire XML and it does a much better job of early termination when the search string is found.

This is true for the test data above using SQL Server 2012. It could be different for you with your data and your search strings. You have to test to know what is best for you.

NOTE: As stated in the answer to your other question, the two queries above will not return the same result bucause the XML query only search node values where the charindex query searches the entire XML document including nodenames and markup.

Sql-server – Indexing for query containing xml column

If you are updating the xml that often, then consider property promotion, ie converting the xml to relational tables, as also suggested by others.

Re the design, are you storing Balance and Transaction in more than one place? This sounds dangerous. If the xml is a kind of history table then it might be better as that. You could also make sure the history table is populated at the same time as the main table within a transaction so it's safe. Your current design doesn't feel safe to me. The fact your two UPDATEs above aren't done in a transaction is a bit worrying. Have you had mismatches between the main table and xml?

Having said all that, I did a simple test rig to try and reproduce this problem and the UPDATEs run in under a second for me on my laptop on a 1 million row table. Can you see anything different between this rig and your queries?

USE tempdb
GO

IF OBJECT_ID('[dbo].[account]') IS NOT NULL DROP TABLE [dbo].[account]
GO
CREATE TABLE [dbo].[account](
    [ID] [int] NULL,
    [Type] [char](10) NULL,
    [Date] [date] NULL,
    [Balance] [decimal](15, 2) NULL,
    [TRansaction] [decimal](15, 2) NULL,
    [mybal] [decimal](15, 2) NULL,
    [dailybalance] [xml] NULL,
    [AutoIndex] [int] IDENTITY(1,1) NOT NULL,
 CONSTRAINT [PK_BalanceTable] PRIMARY KEY CLUSTERED 
(
    [AutoIndex] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO


-- Generate some dummy rows
;WITH cte AS
(
SELECT TOP 1000000
    ROW_NUMBER() OVER( ORDER BY ( SELECT NULL ) ) AS rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.account ( ID, [TYPE], [Date], Balance, [Transaction], mybal, dailybalance )
SELECT
      rn AS ID
    , CHAR( 65 + ( (rn -1) % 6 ) ) AS [TYPE]
    , DATEADD( day, (rn -1) % 90, '1 Jan 2013' ) AS [Date]
    , RAND() * 10 * ( (rn -1) % 10 ) AS [Balance]
    , RAND() * 10 * ( (rn -1) % 10 ) AS [Transaction]
    , RAND() * 10 * ( (rn -1) % 10 ) AS [mybal]

    , ( SELECT CONVERT( CHAR(10), DATEADD( month, x.y, '1 Jan 2013' ), 120 ) AS "date", x.y AS "Balance", 1 AS "Transaction" FROM ( SELECT DISTINCT TOP 12 column_id y FROM master.sys.columns ) x FOR XML PATH('Row'), ROOT('Root'), TYPE ) [dailybalance]

FROM cte
GO


-- Create data for our condition
UPDATE a
SET [Date] = '28-Feb-2013',
    dailybalance.modify('insert 
<Row>
  <date>2013-02-04</date>
  <Balance>-1</Balance>
  <Transaction>-1</Transaction>
</Row>

  as last into Root[1]
  ')
FROM dbo.account a
WHERE [ID] = 257
GO

-- Covering index for the query
-- CREATE INDEX _idx ON dbo.account ( ID, [Date] )


SELECT 'before' s, * FROM dbo.account
WHERE [ID] = 257


UPDATE account
SET [dailybalance].modify('replace value of (/Root/Row[date=''2013-02-04'']/Balance/text())[1] 
with (/Root/Row[date=''2013-02-04'']/Balance)[1] +280')   
WHERE [ID] = 257 
  AND [Date] = '28-Feb-2013'  
  AND [dailybalance].exist('/Root/Row[date=''2013-02-04'']') = 1;


UPDATE account
set [dailybalance].modify('replace value of (/Root/Row[date=''2013-02-04'']/Transaction/text())[1] 
with (/Root/Row[date=''2013-02-04'']/Transaction)[1] +280')   
where [ID] = 257 
  and [Date] = '28-Feb-2013'  
  and  [dailybalance].exist('/Root/Row[date=''2013-02-04'']') = 1;


SELECT 'after' s, * FROM dbo.account
WHERE [ID] = 257

How many rows in your account table? If your xml is actually very large, then that might explain things. Could there be other issues such as blocking, triggers, or is the server busy? Looking at the query, it's unlikely full-text indexing would help you much. XML Indexing might help, but at a significant storage penalty. The table can end up between 2-5 its original size and you don't have that much room to play with in SQL Express 2008 R2 (10GB), so again probably not recommended.

One other thing that might help is a covering index to support the UPDATE, eg

CREATE INDEX _idx ON dbo.account ( ID, [Date] )

Try that and see if that helps your UPDATE. If not, try and provide the additional information requested above and I'll have another look.

Best Answer

Related Solutions

Sql-server – Query to search for a substring in xml

Sql-server – Indexing for query containing xml column

Related Question