How to find state name which has the most vehicle accidents in the database

maxoracle-11g

I have a table "ACC_DATA" that has the following columns: ACCIDENT_ID, STATE_CODE, ACC/DATE and a second table "US_STATES" which has the following columns: STATE_CODE and STATE_NAME

Each accident that happened is given an ID and the State in which it occurred, in the "ACC_DATA" table. How can I use SQL to find the name of the State which had the most accidents?

This is all that I have so far:

SELECT AD.STATE_CODE, COUNT(*) AS FREQUENCY
FROM ACC_DATA AD, US_STATES US
GROUP BY AD.STATE_CODE
ORDER BY COUNT(*) DESC

But this only gives a table of State codes with number of accidents in each. I am not quite sure how to join the two tables to produce the state name with a single max value.

Best Answer

A join should do, and then select the state name instead.

SELECT US.STATE_NAME, COUNT(*) AS FREQUENCY
FROM ACC_DATA AD, US_STATES US
WHERE AD.STATE_CODE = US.STATE_CODE
GROUP BY US.STATE_NAME
ORDER BY COUNT(*) DESC

Related Solutions

Oracle 11g – How to Include Zeros in a Column Count

There's not much information you provide, so this is only a guess.

Your WHERE clause Tab2.Col1 = Tab1.Col1(+) may be backward, with regard to the (+) ... I would recommend switching it to a LEFT JOIN regardless, just to make it more readable..

select
  count(b.Col1) "Number of Records", --- CHANGED a.Col1 to b.Col1
  a.col2 "Type",
  to_char(round(ratio_to_report(count(b.Col1)) over()*100)) || '%' as "Percantage of Total"
from Tab1 a
left join Tab2 b on tab1.col1 = tab2.col1 -- or (from tab2 left join tab1) if that is what's intended
group by a.Col1;

If you are still not getting the 0 counts, then you should try a coalesce statement.

select
  COALESCE(count(b.Col1), 0) AS "Number of Records", --- COALESCE HERE .. CHANGED a.Col1 to b.Col1
  a.col2 "Type",
  to_char(round(ratio_to_report(count(b.Col1)) over()*100)) || '%' as "Percantage of Total"
from Tab1 a
left join Tab2 b on tab1.col1 = tab2.col1
group by a.Col1;

You are not referencing table b anywhere in your sql, though - please explain why you need to join to it if you're not going to be using its data?

Postgresql – Displaying a value associated with a maximum value

You could use a window function here, but I would think there's actually a better solution with DISTINCT ON.

First I simplified what you had so far:

SELECT p.name AS product_name, p.description AS product_description
     , a.stock_sum, b.purchase_sum
     , c.max_sales, o.outlet_name   -- still missing
FROM   Product p
LEFT   JOIN (
   SELECT product_id, SUM(copies) AS stock_sum
   FROM   Stock
   GROUP  BY 1
   ) a USING (product_id)
LEFT   JOIN (
   SELECT product_id, sum(copies) AS purchase_sum
   FROM   PurchaseItem
   GROUP  BY 1
   ) b USING (product_id)
-- c, o still  missing

It should be considerably faster to aggregate counts before joining:

Why does the following join increase the query time significantly?

Also, LEFT JOIN retains products in the result that don't have any purchases, yet, or aren't in stock any more.

Then add the missing parts:

LEFT   JOIN (
   SELECT DISTINCT ON (product_id)
          pi.product_id, pu.outlet_id, sum(copies) AS max_sales
   FROM   Purchase      pu
   JOIN   PurchaseItem  pi USING (purchase_id)
   GROUP  BY 1, 2
   ORDER  BY 1, sum(copies) DESC NULLS LAST
   ) c  USING (product_id)
LEFT   JOIN Outlet o USING (outlet_id);

About DISTINCT ON:

Select first row in each GROUP BY group?

You can run DISTINCT over the results of the aggregation. Consider the sequence of events in a query:

Best way to get result count before LIMIT was applied in PHP/PostgreSQL

Optimize performance

It's probably cheaper to scan PurchaseItem only once, using a CTE. But this also adds some overhead. You'll have to test which is faster:

WITH ct AS (
   SELECT pi.product_id, pu.outlet_id, sum(pi.copies) AS sales
   FROM   PurchaseItem  pi 
   JOIN   Purchase      pu USING (purchase_id)
   GROUP  BY 1, 2
   )
SELECT p.name AS product_name, p.description AS product_description
     , a.stock_sum, b.purchase_sum
     , c.max_sales, o.outlet_name
FROM   Product p
LEFT   JOIN (
   SELECT product_id, SUM(copies) AS stock_sum
   FROM   Stock
   GROUP  BY 1
   ) a USING (product_id)
LEFT   JOIN (
   SELECT product_id, sum(sales) AS purchase_sum
   FROM   ct
   GROUP  BY 1
   ) b USING (product_id)
LEFT   JOIN (
   SELECT DISTINCT ON (product_id)
          product_id, outlet_id, sales AS max_sales
   FROM   ct
   ORDER  BY product_id, sales DESC
   ) c  USING (product_id)
LEFT   JOIN Outlet o USING (outlet_id);

Test performance with EXPLAIN ANALYZE (a couple of times to exclude caching effects).

Best Answer

Related Solutions

Oracle 11g – How to Include Zeros in a Column Count

Postgresql – Displaying a value associated with a maximum value

Optimize performance

Related Question