Oracle aggregate function performance

oracleperformance

Can Oracle be smart about aggregate functions, such as MIN(), MAX(), and (AVG)? My testing shows that it seems to be surprisingly stupid.

I have the following query:

SELECT COUNT(userId), AVG(age), 
 STDDEV(age), MIN(age), MAX(age), 
 date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
GROUP BY date_range_start, date_range_end;

It takes 27 seconds.
Now I remove the STDDEV, MIN and MAX aggregations the same query takes only 12 seconds.
OK, I can see STDDEV slowing things down as it requires 2 passes.
So I try AVG + MIN and MAX — I get 21s.

How is this even possible? How can adding calculation of min and max to the calculation of AVG slow things down by the factor of 2 almost? Considering that out of the 12 seconds that it takes with AVG only 10 are spent on the full table scan? So adding min/max calculation changes the group by step from 2 seconds to 10?

The explain plan:

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation      | Name         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |      1 |        |     12 |00:00:24.61 |     369K|    369K|   |   |      |
|   1 |  HASH GROUP BY     |              |      1 |     12 |     12 |00:00:24.61 |     369K|    369K|   762K|   762K|   11M (0)|
|*  2 |   TABLE ACCESS FULL| USER         |      1 |     29M|     29M|00:00:09.34 |     369K|    369K|   |   |      |
-------------------------------------------------------------------------------------------------------------------------------------

Best Answer

Use Oracle analytics:

SELECT distinct
  COUNT(userId) over (partition by date_range_start, date_range_end)
, AVG(age)      over (partition by date_range_start, date_range_end)
, STDDEV(age)   over (partition by date_range_start, date_range_end)
, MIN(age)      over (partition by date_range_start, date_range_end)
, MAX(age)      over (partition by date_range_start, date_range_end)
, date_range_start, date_range_end
FROM users
WHERE
(date_range_start >= TO_DATE('01-Dec-2010')) AND (date_range_end <= TO_DATE('30-Nov-2011')) 
/

It does the same but most of the times it is surprizingly faster.

Related Solutions

Mysql – Slow SSD on Dell 710

Before tweaking the disk you should tweak memory usage for the DB - especially with mySQL.

From what I read I suspect that your DB is doing heavy write IO - which is faster on an ext2 and on a "real" disk.

Update 2011-11-23 (after migration to dba):

Perhaps you should analyze your DB with the free TOAD version.

Mysql – Slow complex query with group/order

I can see couple things that should improve your query performance.

1 As you already found out there is absolutely no need to join mentioncache. Using EXISTS seems more natural (or IN as you did, but EXISTS may work better from performance point of view).

2 DATE(m.indexed) BETWEEN "2012-09-16" AND "2012-10-16" can be rewritten to m.indexed between "2012-09-16" AND "2012-10-16 23:59:59", so mysql can use index.

3 urlinfluranks doesn't seem to be used anywhere except in LEFT JOIN, why do you need it?

4 f.foreign_id can be either null or m.id, and this is the only reference to favoureditems table, I'd rather use subquery in this case.

Finally, I think you can get the same results without GROUP BY m.id (as far as I understood , mentions.id a primary key).

SELECT   
m.id, m.title, m.title_text, m.content_text, m.url,m.root_url,m.sub_type,m.indexed,  
CASE 
 WHEN EXISTS 
    (SELECT NULL FROM favoureditems f WHERE f.model = "Mention" 
    AND f.foreign_id = m.id AND f.owner_id = 803) THEN m.id 
END AS f.foreign_id,
, v.foreign_id, v.created, mfs.score,  
Image.id,Image.model,Image.foreign_key, Image.dirname,Image.basename,  
(REPLACE(REPLACE(m.host_url, 'http://www.', ''), 'http://', '')) AS Mention__plain_url  
FROM mentions AS m  

LEFT JOIN 
(
  SELECT id,model,foreign_key,dirname,basename 
  FROM attachments Image  
  WHERE model = 'Mention'
  GROUP BY foreign_key
 )Image  ON (Image.foreign_key = m.id)      

LEFT JOIN 
(
   SELECT v.foreign_id, v.created 
   FROM visiteditems AS v  
   WHERE (v.model = "Mention"  AND v.owner_id = 803)  
    GROUP BY v.foreign_id
)v ON (v.foreign_id = m.id)
LEFT JOIN 
(
   SELECT mention_id,score
   FROM mentionfeedscores mfs  
   WHERE mfs.feed_id = '474737584865424564398208323289092'
   GROUP BY mention_id
)mfs ON (mfs.mention_id = m.id )

WHERE m.indexed BETWEEN "2012-09-16" AND "2012-10-16 23:59:59"  
   AND EXISTS 
  (
     SELECT NULL FROM mentioncache mc  
      WHERE mc.mention_id = m.id AND mc.profile_id = 803  
   )    
ORDER BY m.indexed DESC  
LIMIT 10

Best Answer

Related Solutions

Mysql – Slow SSD on Dell 710

Mysql – Slow complex query with group/order

Related Question