Partition pruning with multiple date columns

data-warehousedatabase-designoracleoracle-11g-r2schema

I have a large table in Oracle 11g database that holds historical data from several years, so I would like to partition it by year. The problem is that the table has multiple date columns and they are all used in queries, so I can't just pick one date column and use it as partition key.

Most of the time dates are close to each other, so I have created partitions for each year, plus one "overflow" partition that holds the rows that cross the year boundary. Here is a simplified example:

create table t (
  start_year int,
  end_year int,
  partition_year int as (case when start_year=end_year then start_year else 0 end),
  data blob 
)
partition by range(partition_year) (
  partition poverflow values less than (1000),
  partition p2000 values less than (2001),
  partition p2001 values less than (2002),
  partition p2002 values less than (2003),
  partition p2003 values less than (2004),
  partition p2004 values less than (2005)
);

The problem with this approach is that partition_year must be explicitly referenced in queries or partition pruning (highly desirable because the table is large) doesn't take effect. This table is used for ad-hoc aggregate queries by multiple users; I can't expect that they all remember this logic.

This can be solved with a view

create or replace view v as
select *
from t
where partition_year=start_year 
  and partition_year=end_year 
  and partition_year>1000
union all
select *
from t partition (poverflow);

Now queries like this one

select * from v where start_year >= 2003 and end_year <= 2004;

Use correct partitions (5-6 + 1 in plan below):

---------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |  4030 |     2   (0)| 00:00:01 |       |       |
|   1 |  VIEW                      | V    |     1 |  4030 |     2   (0)| 00:00:01 |       |       |
|   2 |   UNION-ALL                |      |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR|      |     1 |  2041 |     2   (0)| 00:00:01 |     5 |     6 |
|*  4 |     TABLE ACCESS FULL      | T    |     1 |  2041 |     2   (0)| 00:00:01 |     5 |     6 |
|   5 |    PARTITION RANGE SINGLE  |      |     1 |  2041 |     2   (0)| 00:00:01 |     1 |     1 |
|*  6 |     TABLE ACCESS FULL      | T    |     1 |  2041 |     2   (0)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("START_YEAR">=2003 AND "END_YEAR"<=2004 AND "END_YEAR">=2003 AND 
              "START_YEAR"<=2004 AND "PARTITION_YEAR"<=2004 AND "PARTITION_YEAR"="START_YEAR" AND 
              "PARTITION_YEAR"="END_YEAR")
   6 - filter("START_YEAR">=2003 AND "END_YEAR"<=2004)

The problem is that if I replace int types with dates, this doesn't work any more. I have tried to extract the year component from dates and add corresponding constraints to the view, but partitions are not pruned. Changing the type of partition_year to date also didn't help.

Is there any way how I could have multiple date columns in a table and still be able to use partition pruning?

Best Answer

Oracle is unable to do partition pruning when a function is applied to the partitioned column. From the docs:

There are several cases when the optimizer cannot perform pruning. One common reasons is when an operator is used on top of a partitioning column. This could be an explicit operator (for example, a function) or even an implicit operator introduced by Oracle as part of the necessary data type conversion for executing the statement.

Your view has to apply some form of function to start and end dates to figure out if they're the same year or not, so I believe you're out of luck with this approach.

Our solution to a similar problem was to create materialized views over the base table, specifying different partition keys on the materialized views.

We've tailored ours to match common base queries so that we get query rewrite benefits as well. You may need to get users to use the MVs directly to ensure you get the partition pruning working as you need, rather than relying on query rewrite.

(Updated to remove incorrect example and add info regarding applying functions to partition columns)

Related Solutions

Simple query with date type requires explicit conversion to_date of date

Your belief that this should not be necessary is correct based on your assumptions. Normally this means the assumptions should be rechecked (unless there is data corruption or a bug). If you run the following you should get the same results from the first query as the second:

DROP TABLE t1;
CREATE TABLE t1 AS (
   SELECT to_date('01-FEB-2011','DD-MON-YYYY')+level myDateColumn 
   FROM dual CONNECT BY level <=120);

select * from t1 
where myDateColumn 
between to_date('13-FEB-11', 'DD-MON-YY') AND TO_DATE('15-FEB-11', 'DD-MON-YY');

select myDateColumn from t1 
where to_date(myDateColumn) 
between to_date('13-FEB-11', 'DD-MON-YY') AND TO_DATE('15-FEB-11', 'DD-MON-YY');

To check some of the assumptions, can you show the results of the following query?

SELECT myDateColumn, to_date(myDateColumn)
   , to_char(myDateColumn,'DD-MON-YY HH.MI.SS PM'), to_char(myDateColumn,'YYYY') 
FROM myTable
WHERE to_date(myDateColumn)
BETWEEN to_date('13-FEB-11', 'DD-MON-YY') AND TO_DATE('15-FEB-11', 'DD-MON-YY');

The first three columns should all show identical information and the last should verify that the year is correct.

Mysql – effective thesql table/index design for 35 million rows+ table, with 200+ corresponding columns (double), any combination of which may be queried

coincidently I am also looking into one of the client support where we designed key-value pair structure for flexibility and currently table is over 1.5B rows and ETL is way too slow. well there are lot of other things in my case but have you thought about that design. you will have one row with all 200 columns present value, that row will convert in to 200 rows in Key-Value pair design. you will gain space advantage with this design depending on for a given AssetID and Date how many rows has actually all 200 f1 to f200 values present? if you say even 30% od columns have NULL value than that is your space saving. because in key-value pair design if value id NULL that row doesn't need to be in table. but in existing column structure design even NULL takes space.(I am not 100% sure but if you have more that 30 columns NULL in table then NULL take 4bytes). if you see this design and assume that all 35M rows has values in all 200 columns then you current db will become 200*35M=700M rows in table right away. but it will not be much high in table space what you had with all columns in single table as we are just Transposing the Columns in to row. in this transpose operation actually we will not have rows where the values are NULL. so you can actually run query against this table and see how many nulls are there and estimate you target table size before you actually implement it.

second advantage is read performance. as you mentioned that new way of querying the data is any combination this f1 to f200 column in where clause. with key value pair design f1 to f200 are present in one column lets say "FildName" and their values are present in second column lets say "FieldValue". you can have CLUSTERED index on both columns. your query will be UNION of those Selects.

WHERE (FiledName = 'f1' and FieldValue BETWEEN 5 AND 6)

UNION

(FiledName = 'f2' and FieldValue BETWEEN 8 AND 10)

etc.....

I will give you some performance numbers form actual prod server. we have 75 price columns for each security TICKER.

Best Answer

Related Solutions

Simple query with date type requires explicit conversion to_date of date

Mysql – effective thesql table/index design for 35 million rows+ table, with 200+ corresponding columns (double), any combination of which may be queried

Related Question