Optimize large table design

database-designoptimizationoracle-11g-r2

First, I am a complete novice when it comes to databases, but have been given the task to speed up queries on a large set of data. (Hundreds of millions of records) The current implementation is a very simple data warehouse that was created long ago in Oracle. The existing table has no primary key but each record is unique. The table is indexed by the first two columns listed below.

The data itself is fairly simple:

  1. device – there are multiple devices with a unique number indicator
  2. data generation time – each device generates a set of data multiple times a day at different, random times. Each data set covers multiple days. Sample times for individual data points in the set can be up to every second. For times prior to the data generation time, these are measured results. For times after the data generation time, these are the device’s prediction of what the data will be. Queries are often pulled for a full day that compare measured verses predicted data for a given device (e.g. how well did the device predict future needs)
  3. Date/Time of the data points
  4. Data point 1
  5. Data point 2

.
.
.
Data point 23

The major types of queries are:

  1. Give me the latest data generated by a device
  2. Give me all the data for a device for a given day (as previously described above.)
  3. Give me the data generation times for a device on a given day

My idea to speed up queries would be to split the table up into two tables as follows:

MetaData Table (each of the first 3 will be indexed)

  1. device
  2. data generation time
  3. day – This would be a new, indexed data point
  4. Primary Key – a number with the device number, data generation time (141230073205 – for 2014 Dec 30 07:32:05), and day (150102 – for 2015 Jan 02)

Data Points Table (There will be 10s of thousands of these for each entry in the MetaData table above)

  1. Foreign Key – that points to the Primary Key in the above MetaData table for which this particular point is valid
  2. Date/Time of the data points
  3. Data point 1
  4. Data point 2
    .
    .
    .
    Data point 23

So, long story short (too late!):

  1. Is this a valid approach to speeding up queries?
  2. Is there a better way to organize the data?
  3. What other things can I do to cut down on the query times?
  4. Any sqlplus coding tips would also be greatly appreciated.

Best Answer

I was able to split the table up as described above. Since I did not know the data generation time a priori, the approach above did give a 6 to 10 times speed up on queries.