Sql-server – Correct way to group by month and year

performancequery-performancesql serversql-server-2012

I need to group by month (and year) and I was thinking about:

GROUP BY CAST(YEAR(tDate) AS NVARCHAR(4)) + '-' + CAST(MONTH(tDate) AS NVARCHAR(2))

But I found in the web something like:

GROUP BY YEAR(tDate), Month(tDate)

Both are equivalent? It is better to use the second?

Best Answer

You should go with the second approach. All of the string concatenation and type conversion just adds unnecessary CPU overheard to the query.

I can try this out on the Stack Overflow schema. I'm going to query the user's table, and create a supporting nonclustered index on the date field:

CREATE NONCLUSTERED INDEX IX_CreationDate ON dbo.Users (CreationDate);

Here's an example of the first approach:

SELECT 
    CAST(YEAR(u.CreationDate) AS NVARCHAR(4)) 
        + '-' 
        + CAST(MONTH(u.CreationDate) AS NVARCHAR(2)), 
    COUNT(*)
FROM dbo.Users u
GROUP BY 
    CAST(YEAR(u.CreationDate) AS NVARCHAR(4)) 
        + '-' 
        + CAST(MONTH(u.CreationDate) AS NVARCHAR(2));

And the plan and stats:

Table 'Worktable'. Scan count 0, logical reads 0
Table 'Workfile'. Scan count 0, logical reads 0
Table 'Users'. Scan count 1, logical reads 675

 SQL Server Execution Times:
   CPU time = 203 ms,  elapsed time = 258 ms.

Compare that to the second approach:

SELECT 
    YEAR(u.CreationDate), Month(u.CreationDate), 
    COUNT(*)
FROM dbo.Users u
GROUP BY YEAR(u.CreationDate), Month(u.CreationDate);

Here are the plan and stats for that one:

Table 'Worktable'. Scan count 0, logical reads 0
Table 'Workfile'. Scan count 0, logical reads 0
Table 'Users'. Scan count 1, logical reads 675

 SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 229 ms.

As you can see, the string concat / cast approach uses slightly more CPU. Everything else is the same.

By the way, the dbo.Users table only has about 300,000 rows in it. Going through this same exercise on the dbo.Posts table, which has about 3,700,000 rows in it, the query goes parallel, and the CPU savings for the second approach become much more significant.

So not only is the 2nd approach better in terms of CPU in the general case, it also appears to scale better as data size increases.

Related Solutions

Sql-server – T-SQL INNER JOIN Optimisation

The reason the performance is better is that the smaller data types have much less working set - see the execution plan (http://www.red-gate.com/our-company/about/book-store/assets/sql-server-execution-plans.pdf)

I expect you get the largest benefit from the nvarchar(4000) to nvarchar(50) - that's a reduction of 80x - and nvarchar(4000) can use up to 8K! of space. For a key, that is not a good idea.

In addition, the fact that there are no foreign keys probably mean you don't have a very good indexing strategy either. If you did have indexes (even for ridiculously large columns), you would probably find they could outperform the cast since it probably wouldn't spool as much.

In general, you don't want any operations on your keys in the join if at all possible, especially for large data sets.

Mysql – How to efficiently search for all records from a given month, or a given year

Why not the classic way to avoid functions

WHERE startDate BETWEEN '2014-01-01' AND '2014-12-31';

WHERE startDate LIKE '2014-01-%' ;

Best Answer

Related Solutions

Sql-server – T-SQL INNER JOIN Optimisation

Mysql – How to efficiently search for all records from a given month, or a given year

Related Question