Oracle – Equivalent Idioms in Google BigQuery

google-bigqueryoracle

I was reading this article that is about Google BigQuery:

Exploring a powerful SQL pattern: ARRAY_AGG, STRUCT and UNNEST

They use a couple of functions that I am trying to figure out what the Oracle equivalents are, if they even exist.

I am digging through the Oracle documentation and I found something similar to ARRAY_AGG in Oracle called LISTAGG.

What I have not been able to find are the idiomatic equivalents of STRUCT and UNNEST.

Are there things that do the same thing as STRUCT and UNNEST or will I have to write my own?

Best Answer

Technique Translation

The technique for solving a problem in one RDBMS doesn't always translate well into another.

I've had performance problems with similar code to that which was in the article. Thus, my comment "This technique will kill performance in Oracle".

I don't see CAST(MULTISET()) (Oracle's version of ARRAY_AGG) being used too much. As such, the code might be unmaintainable by your replacement after you get a promotion.

Translating Terms

Translation goes like this

  • STRUCT in BigData is TYPE in Oracle
  • ARRAY_AGG in BigData is CAST(MULTISET()) in Oracle
  • UNNEST in BigData is TABLE() in Oracle
  • UNNEST of XML Data is XMLTABLE() in Oracle
  • UNNEST of JSON Data is JSON_TABLE() in Oracle (12c+)

You can't create the structure on the fly. You have to define it ahead of time.

EXAMPLES

Creating a structure of a single row:

create type emp_t as object (
  EMPNO             NUMBER(4),    
  ENAME             VARCHAR2(10) ,
  JOB               VARCHAR2(9)  ,
  MGR               NUMBER(4)    ,
  HIREDATE          DATE         ,
  SAL               NUMBER(7,2)  ,
  COMM              NUMBER(7,2)  ,
  DEPTNO            NUMBER(2)  
);
/

Creating a structure of a set of rows based on above: ( nested table )

create type emp_tt as table of emp_t;
/

Using CAST(MULTISET())

select d.deptno
  ,cast(multiset(
           select * from scott.emp e where e.deptno=d.deptno
        ) as emp_tt) as emp_table
from dept d
;

Using TABLE()

with data as (
  select d.deptno
    ,cast(multiset(
             select * from scott.emp e where e.deptno=d.deptno
           ) as emp_tt) AS emp_table
  from dept d
)
select b.*
from data a, table( a.emp_table ) b
order by empno;

Oracle's Solution

Business Requirement: I want the first time a hurricane reached its maximum category along with its position.

Other, inferred, identifiers to "GROUP BY" would be season, basin, subbasin in addition to hurricane name.

One method to solve this in Oracle is to use Analytics.

Since I don't have access to the huricane data, I'll have to improvise.

  select deptno as hurricane_name
        ,2017   as season
        ,'NA'   as basin
        ,'WP'   as subbasin
        ,sal    as category
        ,ename  as position
        ,rownum as time
  from scott.emp

First thing we do is use the analytic function RANK() to rank all of the rows by category (descending) and time (ascending) but partition the rankings by season, basin, subbasin, and hurricane_name

  select h.*
    ,RANK() over (partition by season, basin, subbasin, huricane_name
                  order by category desc, time)
      as rank_score
  from huricane_data h

Finally, we'll pick only the best (rank_score=1)

select *
from analyized_data
where rank_score=1
order by season, basin, subbasin, hurricane_name, time

Putting it all together

with hurricane_data as (
  select deptno as hurricane_name
        ,2017   as season
        ,'NA'   as basin
        ,'WP'   as subbasin
        ,sal    as category
        ,ename  as position
        ,rownum as time
  from scott.emp
), analyized_data as (
  select h.*
    ,RANK() over (partition by season, basin, subbasin, hurricane_name
                  order by category desc, time) rank_score
  from hurricane_data h
)
select *
from analyized_data
where rank_score=1
-- place season/basin/subbasin filters here
order by season, basin, subbasin, hurricane_name, time;