SQL Server Query Performance – Impact of Non-Relevant Columns

performancequery-performanceselectsql server

I'm just curious.

Say you have a table of 1 million records/ rows.

select order_value from store.orders

Does it make a difference whether that table has 1 field, 2 fields, or 100 fields, in actual query time? I mean all fields other than "order_value."

Right now I'm pushing data to a data warehouse. Sometimes I dump fields into the table that "may be used in the future, someday" – but they aren't being queried right now, by anything. Would these 'extraneous' fields affect select statements that do no include them, directly or indirectly (no * I mean)?

Best Answer

This really depends on indexes and data types.

Using the Stack Overflow database as an example, this is what the Users table looks like:

NUTS

It has a PK/CX on the Id column. So it's the entirety of the table data sorted by Id.

With that as the only index, SQL has to read that whole thing (sans the LOB columns) into memory if it's not already there.

DBCC DROPCLEANBUFFERS-- Don't run this anywhere near prod.

SET STATISTICS TIME, IO ON 

SELECT u.Id
INTO  #crap1
FROM dbo.Users AS u

The stats time and io profile looks like this:

Table 'Users'. Scan count 7, logical reads 80846, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2406 ms,  elapsed time = 446 ms.

If I add an additional nonclustered index on just Id

CREATE INDEX ix_whatever ON dbo.Users (Id)

I now have a much smaller index that satisfies my query.

DBCC DROPCLEANBUFFERS-- Don't run this anywhere near prod.

SELECT u.Id
INTO  #crap2
FROM dbo.Users AS u

The profile here:

Table 'Users'. Scan count 7, logical reads 6587, physical reads 0, read-ahead reads 6549, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2344 ms,  elapsed time = 384 ms.

We're able to do far fewer reads and save a little CPU time.

Without more information about your table definition, I can't really try to reproduce what you're trying to measure any better.

But you're saying that unless there is a specific index on that lone column, the other columns/ fields will also be scanned? Is this just a drawback inherent to the design of rowstore tables? Why would irrelevant fields be scanned?

Yes, this is specific to rowstore tables. Data is stored by the row on data pages. Even if other data on the page is irrelevant to your query, that whole row > page > index needs to be read into memory. I wouldn't say that the other columns are "scanned" so much as the pages they exist on are scanned to retrieve the single value on them relevant to the query.

Using the ol' phonebook example: even if you're just reading phone numbers, when you turn the page, you're turning last name, first name, address, etc along with the phone number.