Sql-server – Why is an aggregate query significantly faster with a GROUP BY clause than without one

aggregateparallelismperformancequery-performancesql-server-2005

I'm just curious why an aggregate query runs so much faster with a GROUP BY clause than without one.

For example, this query takes almost 10 seconds to run

SELECT MIN(CreatedDate)
FROM MyTable
WHERE SomeIndexedValue = 1

While this one takes less than a second

SELECT MIN(CreatedDate)
FROM MyTable
WHERE SomeIndexedValue = 1
GROUP BY CreatedDate

There is only one CreatedDate in this case, so the grouped query returns the same results as the ungrouped one.

I noticed the execution plans for the two queries are different – The second query uses Parallelism while the first query does not.

Query1 Execution Plan
Query2 Execution Plan

Is it normal for SQL server to evaluate an aggregate query differently if it doesn't have a GROUP BY clause? And is there something I can do to improve the performance of the 1st query without using a GROUP BY clause?

Edit

I just learned I can use OPTION(querytraceon 8649) to set the cost overhead of parallelism to 0, which makes makes the query use some parallelism and reduces the runtime to 2 seconds, although I don't know if there's any downsides to using this query hint.

SELECT MIN(CreatedDate)
FROM MyTable
WHERE SomeIndexedValue = 1
OPTION(querytraceon 8649)

enter image description here

I'd still prefer a shorter runtime since the query is meant to populate a value upon user selection, so should ideally be instantaneous like the grouped query is. Right now I'm just wrapping my query, but I know that's not really an ideal solution.

SELECT Min(CreatedDate)
FROM
(
    SELECT Min(CreatedDate) as CreatedDate
    FROM MyTable WITH (NOLOCK) 
    WHERE SomeIndexedValue = 1
    GROUP BY CreatedDate
) as T

Edit #2

In response to Martin's request for more info:

Both CreatedDate and SomeIndexedValue have a separate non-unique, non-clustered index on them. SomeIndexedValue is actually a varchar(7) field, even though it stores a numeric value that points to the PK (int) of another table. The relationship between the two tables is not defined in the database. I am not supposed to change the database at all, and can only write queries that query data.

MyTable contains over 3 million records, and each record is assigned a group it belongs to (SomeIndexedValue). The groups can be anywhere from 1 to 200,000 records

Best Answer

It looks like it is probably following an index on CreatedDate in order from lowest to highest and doing lookups to evaluate the SomeIndexedValue = 1 predicate.

When it finds the first matching row it is done, but it may well be doing many more lookups than it expects before it finds such a row (it assumes the rows matching the predicate are randomly distributed according to date.)

See my answer here for a similar issue

The ideal index for this query would be one on SomeIndexedValue, CreatedDate. Assuming that you can't add that or at least make your existing index on SomeIndexedValue cover CreatedDate as an included column then you could try rewriting the query as follows

SELECT MIN(DATEADD(DAY, 0, CreatedDate)) AS CreatedDate
FROM MyTable
WHERE SomeIndexedValue = 1

to prevent it from using that particular plan.

Related Solutions

Oracle aggregate function performance

Use Oracle analytics:

SELECT distinct
  COUNT(userId) over (partition by date_range_start, date_range_end)
, AVG(age)      over (partition by date_range_start, date_range_end)
, STDDEV(age)   over (partition by date_range_start, date_range_end)
, MIN(age)      over (partition by date_range_start, date_range_end)
, MAX(age)      over (partition by date_range_start, date_range_end)
, date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
/

It does the same but most of the times it is surprizingly faster.

Mysql – Slow complex query with group/order

I can see couple things that should improve your query performance.

1 As you already found out there is absolutely no need to join mentioncache. Using EXISTS seems more natural (or IN as you did, but EXISTS may work better from performance point of view).

2 DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" can be rewritten to m.indexed between "2012-09-16" AND "2012-10-16 23:59:59", so mysql can use index.

3 urlinfluranks doesn't seem to be used anywhere except in LEFT JOIN, why do you need it?

4 f.foreign_id can be either null or m.id, and this is the only reference to favoureditems table, I'd rather use subquery in this case.

Finally, I think you can get the same results without GROUP BY m.id (as far as I understood , mentions.id a primary key).

SELECT   
m.id, m.title, m.title_text, m.content_text, m.url,m.root_url,m.sub_type,m.indexed,  
CASE 
 WHEN EXISTS 
    (SELECT NULL FROM favoureditems f WHERE f.model = "Mention" 
    AND f.foreign_id = m.id AND f.owner_id = 803) THEN m.id 
END AS f.foreign_id,
, v.foreign_id, v.created, mfs.score,  
Image.id,Image.model,Image.foreign_key, Image.dirname,Image.basename,  
(REPLACE(REPLACE(m.host_url, 'http://www.', ''), 'http://', '')) AS Mention__plain_url  
FROM mentions AS m  

LEFT JOIN 
(
  SELECT id,model,foreign_key,dirname,basename 
  FROM attachments Image  
  WHERE model = 'Mention'
  GROUP BY foreign_key
 )Image  ON (Image.foreign_key = m.id)      

LEFT JOIN 
(
   SELECT v.foreign_id, v.created 
   FROM visiteditems AS v  
   WHERE (v.model = "Mention"  AND v.owner_id = 803)  
    GROUP BY v.foreign_id
)v ON (v.foreign_id = m.id)
LEFT JOIN 
(
   SELECT mention_id,score
   FROM mentionfeedscores mfs  
   WHERE mfs.feed_id = '474737584865424564398208323289092'
   GROUP BY mention_id
)mfs ON (mfs.mention_id = m.id )

WHERE m.indexed BETWEEN "2012-09-16" AND "2012-10-16 23:59:59"  
   AND EXISTS 
  (
     SELECT NULL FROM mentioncache mc  
      WHERE mc.mention_id = m.id AND mc.profile_id = 803  
   )    
ORDER BY m.indexed DESC  
LIMIT 10

Best Answer

Related Solutions

Oracle aggregate function performance

Mysql – Slow complex query with group/order

Related Question