Oracle – Building a Histogram of Road Speed by Grouping Road Segments

group byoraclequerystatistics

A data set look like this:

The table doesn't have sum(segLength yet), but can be computed by summing up sum(segLength) from road_table group by roadID
A roadID (e.i. Lake Street) belongs to a roadType (e.i. St) and consists of segments (e.i. segID is 2005 for a given road block).

I want to plot average speeds of roads grouped by road lengths further grouped by road types like shown in example output below.

How to add columns for each roadType, within each roadType group roads into bins, within each bin compute average value of maxSpeed?

I'm using PL/SQL. I can translate a query from MySQL or PostgreSQL if necessary.

Best Answer

Maybe the following ideas will give you a starting point for developing your queries. We need some test data first. The last SEGID in your question is: 4007. In the example, we are using a table with 4500 random SEGLENGTHs, and IDs, maxspeeds etc similar to the ones you have described (in your sample table).

TEST table and data (Oracle 12c)

create table test ( 
  segid number generated always as identity primary key 
, roadid number
, roadtype varchar2( 64 )
, seglength number ( 10, 1 )
, maxspeed number
, check ( maxspeed between 0 and 100 )
); 

insert into test( roadid, roadtype, seglength, maxspeed )
select 
  roadid
, case
    when roadid < 200 then 'Ave'
    when roadid between 200 and 400 then 'Hwy'
    when roadid between 401 and 700 then 'St'
    when roadid between 701 and 900 then 'Rd'
  end roadtype
, round( dbms_random.value() * 10, 2 ) 
, case
    when roadid < 200 then ( mod( roadid, 4 ) + 1 ) * 10 
    when roadid between 200 and 400 then ( mod( roadid, 7 ) + 1 ) * 10 
    when roadid between 401 and 700 then ( mod( roadid, 3 ) + 1 ) * 10  
    when roadid between 701 and 900 then ( mod( roadid, 6 ) + 1 ) * 10  
  end maxspeed
from 
  ( select level roadid from dual connect by level <= 900 )
, ( select level segcount from dual connect by level <= 5 ) 
;

TEST table: first 5 rows, last 5 rows

SQL> select * from test order by segid fetch first 5 rows only;
SEGID  ROADID  ROADTYPE  SEGLENGTH  MAXSPEED  
1      1       Ave       1.6        20        
2      1       Ave       7.5        20        
3      1       Ave       3.8        20        
4      1       Ave       9.2        20        
5      1       Ave       2.8        20        

SQL> select * from test order by segid offset 4495 rows fetch next 5 rows only;
SEGID  ROADID  ROADTYPE  SEGLENGTH  MAXSPEED  
4496   900     Rd        4.2        10        
4497   900     Rd        4.2        10        
4498   900     Rd        0.7        10        
4499   900     Rd        7.1        10        
4500   900     Rd        2          10

For dividing the SEGLENGTH into "buckets", you could use WIDTH_BUCKET() (see documentation), and use GROUP BY to find the average speed for each "bucket".

  select 
    roadtype, avg( maxspeed ), max( seglength ), wb
  from 
  (
      select roadtype, maxspeed, seglength
      , width_bucket( 
          seglength
        , ( select min( seglength ) from test )
        , ( select max( seglength ) from test ) + 1  -- upper bucket count: exclusive!
        , 5 
       ) wb    
      from test 
    )
  group by wb, roadtype 

-- result
ROADTYPE  AVG(MAXSPEED)                              MAX(SEGLENGTH)  WB  
Hwy       39.24170616113744075829383886255924170616  8.7             4   
Hwy       40.70866141732283464566929133858267716535  10              5   
St        18.91304347826086956521739130434782608696  8.7             4   
St        19.81132075471698113207547169811320754717  10              5   
Rd        34.70588235294117647058823529411764705882  6.5             3   
Ave       24.49275362318840579710144927536231884058  10              5   
Hwy       41.76744186046511627906976744186046511628  6.5             3   
Ave       25.52995391705069124423963133640552995392  2.1             1   
Ave       24.0796019900497512437810945273631840796   8.7             4   
Ave       25.27272727272727272727272727272727272727  4.3             2   
St        20.50632911392405063291139240506329113924  2.1             1   
Rd        34.51754385964912280701754385964912280702  8.7             4   
Rd        35.11811023622047244094488188976377952756  10              5   
Hwy       38.68312757201646090534979423868312757202  4.3             2   
Rd        35.26315789473684210526315789473684210526  2.1             1   
Rd        35.52995391705069124423963133640552995392  4.3             2   
Ave       25.70776255707762557077625570776255707763  6.5             3   
Hwy       40.28708133971291866028708133971291866029  2.1             1   
St        21.03658536585365853658536585365853658537  4.3             2   
St        19.65838509316770186335403726708074534161  6.5             3

Add a PIVOT() to this query, in order to convert the ROADTYPES into columns, and use ROUND() or TRUNC() for obtaining the final values.

select msl, ave, hwy, st, rd
from (
  select 
    roadtype, trunc( avg( maxspeed ) ) ams, max( seglength ) msl, wb
  from 
  (
    select roadtype, maxspeed, seglength
    , width_bucket( 
        seglength
      , ( select min( seglength ) from test )
      , ( select max( seglength ) from test ) + 1  -- upper bucket count: exclusive!
      , 5 
     ) wb    
    from test 
  )
  group by wb, roadtype 
) pivot  (
    avg( ams ) for roadtype in (  -- pivot() requires an aggregate function
      'Ave' as ave
    , 'Hwy' as hwy
    , 'St'  as st
    , 'Rd'  as rd
   )
) P
order by wb
;

-- result    
       MSL        AVE        HWY         ST         RD
---------- ---------- ---------- ---------- ----------
       2.1         25         40         20         35
       4.3         25         38         21         35
       6.5         25         41         19         34
       8.7         24         39         18         34
        10         24         40         19         35

For getting the standard deviation values, just use STDDEV() instead of AVG().

Related Solutions

Sql-server – Grouping records based on intervals of time

select dateadd(minute, 1+datediff(minute, 0, CaptureTime), 0),
       sum(SnapShotValue)
from YourTable
group by dateadd(minute, 1+datediff(minute, 0, CaptureTime), 0)

SE-Data

datediff(minute, 0, CaptureTime) gives you the number of minutes since 1900-01-01T00:00:00.

dateadd(minute, 1+datediff(minute, 0, CaptureTime), 0) adds the number of minutes since 1900-01-01T00:00:00 to 1900-01-01T00:00:00 ending up with a datetime with only minutes.

The 1+ is there because you wanted the next minute.

To do the same with a 5 minute interval you need to do some calculations. Divide the minutes with 5 and multiply with 5 gives you the minutes rounded down to a 5 minute precision. This works because the result of an integer division in SQL Server is an integer.

dateadd(minute, 5 + (datediff(minute, 0, CaptureTime) / 5) * 5, 0)

MySQL – Update Table Based on Last Data from Another Table with Grouping

The correct query is:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY
                     plate_uid)
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Just some little explanations. You need INNER JOIN because you must update row of lp_plates_backup only if plate_uid exists into lp_plates_backup. ORDER BY is useless because you are selecting all rows, order is not important.

You need the max date_validated of rows grouped by plate_uid, select MAX(date_validate) must individuate only a row for plate_uid through the date_validated field. So you need to add group by to select max(..).

The query select plate_uid returns a row for plate_uid so you don't need to aggregate here. The condition of single row is already builded into select max subquery.

I hope you did not get confused by me :-)

Updated

The previous works well if date_validate is the pair (plate_uid, date_validated) is unique. If you have this kind of data:

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   <- changed data
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |   
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

The following query returns the first two rows for plate_uid 1.

SELECT plate_uid, brand, model, date_validated
FROM lp_pictures_backup as parent
WHERE
      brand <> '' AND
      date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY plate_uid)

The update will choice the values from the first or the second row. I think of the following alternatives:

using some other lp_pictures_backup fields to choise between rows with the same date_validated.
enforcing a unique constraint on table. I.e. alter table lp_pictures_backup add unique index (plate_id, date_validated). Rejecting invalid data.
detecting valid lp_pictures_backup pair of (plate_uid, date_validated). Updating lp_plates_backup only with valid pairs, review invalid pairs and correct them.

Maybe there are more alternatives. I prefer enforcing contraint on data so to have better data. I expand 3th alternative. Just create a view to define what a valid pair (plate_uid, date_validated) is:

CREATE VIEW lp_pictures_backup_valid as
SELECT plate_uid, date_validated
FROM lp_pictures_backup as parent
WHERE date_validated = (
        select max(date_validated) 
        from lp_pictures_backup t2 
        where t2.plate_uid =  parent.plate_uid GROUP BY plate_uid
      )
group by plate_uid, date_validated
having count(*) = 1;                    <-- you can change this to make an invalid row

A valid pair (plate_uid, date_validated) is a pair with the max date_validated only if there are a unique date_validated value.

I rewrite the update statements to consider only valid pair:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        -- modification start
        SELECT p.plate_uid, p.brand, p.model, p.date_validated
        FROM lp_pictures_backup as p
             INNER JOIN lp_pictures_backup_valid valid 
                        ON p.plate_uid = valid.plate_uid and 
                           p.date_validated = valid.date_validated
        WHERE p.brand <> '')
        -- modification end
    ) 
    AS m ON m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Hope this make sense.

Update: 2014-03-20

In the first case:

Blockquote 1. using some other lp_pictures_backup fields to choise between rows with the same date_validated.

I have assumed your date is like this:

|id| plate_uid | brand    | model    | date_validated      |
|4 |         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|3 |         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   
|2 |         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|1 |         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |    
|2 |         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|1 |         a | BMW      | 3-Series | 2014-11-07 14:34:11 |

You can try this:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT t1.plate_uid, t1.brand, t1.model, t1.date_validated
        FROM lp_pictures_backup as t1,
          (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
           FROM lp_pictures_backup as t2
           GROUP BY t2.plate_uid) as t3                 
        WHERE t1.brand <> '' AND
              t1.plate_uid = t3.plate_uid AND
              t1.date_validated = t3.dv AND
              t1.id = t3.id
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

The fields used to choise the row to be updated are extracted by this part:

 ...
 (SELECT t2.plate_uid, MAX(id) as id,  MAX(date_validated) as dv
  FROM lp_pictures_backup as t2
  GROUP BY t2.plate_uid) as t3 
 ...

So it assumes a correlation between date_validated and id: at increasing dates corresponds to increasing id.

Hope it help.

Best Answer

Related Solutions

Sql-server – Grouping records based on intervals of time

MySQL – Update Table Based on Last Data from Another Table with Grouping

Related Question