Oracle Single table with json vs set of tables using joins

database-designoracle

I'm building a reporting solution. The data will be stored on Oracle database. I expect to get near to several billion data set since i have to keep data for 1 year period. When designing database schema i faced problem to go with single table or set of tables.

Single table in format of (ID, DATETTIME, JSON ) which has all the data dumped as json in JSON column. this will avoid any joins also cater for future event format changes.
Set of tables ( 4-6) data dived on them. query will be using several joins and several unions. Format will be hard to change.

What would be the better approach in terms of performance. For large data set how efficient json over joins ?

Best Answer

Part of that depends on whether or not you need to ever parse that JSON, say in a WHERE clause, or GROUP BY, etc...

If it stays a faceless blob, it's probably faster to leave it as option #1; but keep in mind this is rarely the case - you're likely to unexpectedly be asked to report/query on the JSON at some point!!!

I don't know about Oracle, but in Microsoft SQL Server, UNIONs tend to perform poorly when ALSO joining; they seem fine as a standalone query but once it becomes a subquery things seem to slow down, unless you do something like a materialized view.

So while I personally prefer option 2, the performance may bog down with a large data set that is over 1 million rows. Can you refrain from using a UNION, and instead place the records in the same table but distinguish them with a type column?

How many rows (and how wide is a single row) of your "large data set"?
Can you temporarily obtain a test environment and try both options under a load test?

Related Solutions

Mysql – Economic Database Architecture Possibilities w/ Data of Varying Length

I would separate the problem into transactions and analytics -- based on the question, seems that you are trying to find a design which would be optimal for both.

From a design point -- on the logical level -- I would use something like this, and would not worry about number of tables. Also, each attribute has proper data type, etc.

enter image description here

From this you may periodically (daily) publish to structures which are more analytic-friendly (flat OLAP tables, data marts ...). Depending on the performance -- and user expectation -- exposing 5NF views may be good enough.

On the physical level, I am not so sure :(

Structures like this are usually exposed to users as flat views (5NF) and via point-in-time functions. The main problem here is that the question is tagged MySQL. MySQL has a limit on number of tables that can appear in a join (61) and the query optimizer does not support table elimination; hence, forget the views. You would have to use the application level to "run-around" and join tables based on the ID and date; the application may be the ETL code that exports to analytic tables.

So, now it depends on how do you expose this to final users -- if they are supposed write custom queries this will not work.

It is a common approach to design a DB on a logical level without a regard for the target DB, but in this case the selection of the DB limits design options.

Struggling with complicated date query

I'll do it in stages to make it easier to understand.

I put your data into a table called EX_DATA - added a primary key as I assume you'll have one and it makes things a little easier:

create table ex_data
(
  contract_id number(18),
  expire_dt date,
  term_notice number(18),
  notice_freq varchar(20),
  notice_months number
);


insert into ex_data values(1,TO_DATE('01-MAR-2005','DD-MON-YYYY'), 60, 'YEARLY', 12);

insert into ex_data values(2,TO_DATE('01-DEC-2007','DD-MON-YYYY'), 30, 'MONTHLY', 1);

insert into ex_data values(3,TO_DATE('28-FEB-2008','DD-MON-YYYY') ,30, 'YEARLY', 12);

The next expiry date from the first given date will be:

select add_months(expire_dt, notice_months) as next_expiry
from ex_data;

Then, it follows that the notice period is a bit of easy date arithmetic too:

select add_months(expire_dt, notice_months) as next_expiry, 
       add_months(expire_dt, notice_months) - term_notice as notice_date
from ex_data;

Using DUAL as a source, generate expiry dates for lots of years:

select expire_dt as first_expiry_date,
       l as expiry_num,
       add_months(expire_dt,l*notice_months) as all_expiry_dates
from ( select level l from dual connect by level < 100 ) l, ex_data
order by l;

The next expiration date, and the date they should have been contacted for renegotiation:

select first_expiry_date, all_expiry_dates as next_expiry_date, all_expiry_dates - term_notice as renegotiation_date
from
(
select expire_dt as first_expiry_date, l as expiry_period_num, term_notice, add_months(expire_dt, l*notice_months) as all_expiry_dates
from ( select level l from dual connect by level < 100 ) l, ex_data
)
where all_expiry_dates >sysdate
and all_expiry_dates - term_notice < sysdate 
;

But, that's not what I think you're asking. You want the next renewal date where the contact period hasn't yet been hit, which can be done as follows:

select * from (
select contract_id, first_expiry_date, all_expiry_dates as next_expiry_date, all_expiry_dates - term_notice as notification_date, ROW_NUMBER( ) OVER (PARTITION BY
contract_id ORDER BY all_expiry_dates) r
from
(
select contract_id, expire_dt as first_expiry_date, l as expiry_period_num, term_notice, add_months(expire_dt, l*notice_months) as all_expiry_dates, notice_months
from ( select level l from dual connect by level < 100 ) l, ex_data
)
where all_expiry_dates - term_notice > sysdate 
)
where r=1;

Can probably be done with some simple SYSDATE maths along with the source data, but my brain is a little slow this evening ;)

Best Answer

Related Solutions

Mysql – Economic Database Architecture Possibilities w/ Data of Varying Length

Struggling with complicated date query

Related Question