Sql-server – Microsoft Optimal method for columns into 1 word: XML Raw or Concat

database-designperformanceperformance-tuningsql serversql-server-2016xml

What is fastest, performance method to make columns into one word? Does Microsoft have recommendation, or is there performance Big-O studies on FOR XML RAW vs Concat?

Company has multiple tables with different column types including int, varchar, string, different number of columns, some not/not null, maybe anywhere from 5-25 columns in a table.

SELECT (Column1, Column2, Column3, Column4... FOR XML RAW)))

SELECT CONCAT(Column1,'|',Column2,'|',Column3,'|',Column4...))

We are seeing different timing results on our server, however both seem to be relatively close, curious if there is any general optimal to apply, rather than reviewing 1000 tables.

Background will be utilized to apply hashing of row later.

Best Answer

In general, if you truly need to know what will be the fastest for your workload then you need to do your own benchmarking. Take a look at the following query plans:

query plans

Which one do you think would be faster if you had to pick one without any additional context? I would pick the bottom one. It simply appears to do less work because there's no nested loop join and there's no UDX operator. To generate the above query plans, put 6.5 million rows into a temp table:

SELECT s1.*
INTO #source_data
FROM master..spt_values s1
CROSS JOIN master..spt_values t2;

This table has six columns with different data types. Some allow nulls and some don't. So it seems to meet your criteria. The top query takes about 3X longer on my machine than the bottom one:

DROP TABLE IF EXISTS #t;

SELECT HASHBYTES('SHA2_256', (SELECT name, number, type, low, high, status FOR XML RAW)) hash_value
INTO #t
FROM #source_data
OPTION (MAXDOP 1);

GO

DROP TABLE IF EXISTS #t;

SELECT HASHBYTES('SHA2_256', CONCAT(name, '|', number, '|', type, '|', low, '|', high, '|', status)) hash_value
INTO #t
FROM #source_data
OPTION (MAXDOP 1);

This seems like a perfectly reasonable result. It will add overhead to convert data to XML and to have a nested loop join. Why do that when you don't have to? In your question you suggest that sometimes the FOR XML RAW technique is faster for your data. If so, there's something about your data or testing methodology that you aren't telling us. My suggestion is to come up with a sharable reproduction that demonstrates that the FOR XML RAW method is significantly faster and to ask a new question. As is we have to guess what's going on, which isn't helpful for anyone.