Sql-server – Database structure, relational or data warehouse

data-warehousedatabase-designrelational-theorysql-server-2008

Good morning,

Im having some issues with how my database should be laid out and would appreciate some guidance. I have several tables (rainfall data), each containing the following columns:

Date (DateTime), Value_of_rainfall (Float)

Each of these tables are for a specific location.

It has been suggested i create another table with the following columns:

LocationID (tinyInt), LocationName(char(6))

and insert into the first table a new column called LocationID(tinyInt).

Now the confusion i have is regarding that the data i have stored in the rainfall data datatables. Its been suggested that all the data from each location be contained in one data table. The statistical analysis im looking to carry out on the info is very (as far as i have envisioned thus far) location specific and wont require querying multiple locations at once. A couple months worth of data for one location = nearly 3 million rows and im looking to set up long running calculations on the data. therefore would a data warehouse be more appropriate? If so, could someone give me some pointers on how i should lay it out?

Thanks for your time.

Note: Im using Sql Server 2008

Best Answer

It seems you want to aggregate location based statistics over time for rainfall. A database structure like the one below would let you do that. The 'data source' could be just a filename, or some indication as to where it came from.

create table DimDataSource (
       DataSourceID      int identity (1,1) not null
       DataSourceDesc    nvarchar (100)  -- May need unicode for file names
)
go

alter table DimDataSource
  add constraint PK_DataSource
      primary key clustered (DataSourceID)
go

create table DimLocation (
       LocationID        int identity (1,1) not null
       LocationDesc      varchar (50)
)
go

alter table DimLocation
  add constraint PK_Location
      primary key clusterd (LocationID)
go

create table DimDate (
       DateID           smalldatetime not null  -- 'Date' is a reserved word
      ,MonthID          int not null
      ,MonthDesc        varchar (15)
      ,QuarterID        int not null
      ,QuarterDesc      varchar (15)
      ,YearID
)
go

alter table DimDate
  add constraint PK_Date
      primary key clustered (DateID)
go

create table DimTime (
       TimeID           time not null  -- 'Time' is a reserved word
      ,Hour             int not null
)
go

alter table DimTime
  add constraint PK_Time
      primary key clustered (TimeID)
go


-- If the table is <50GB, don't bother with partitioning, but put a clustered
-- index on DateID or LocationID and DateID, depending on how you normally expect
-- to query the data.

create table FactRainfall (
       RainfallID        int identity (1,1) not null -- May need a wider type if >4B rows.
                                                     -- SSAS likes an identity column for
                                                     -- incremental loads
      ,DataSourceID      int not null
      ,LocationID        int not null
      ,DateID            smalldatetime not null
      ,TimeID            time not null
      ,Rainfall          float
)
go

-- Add foreign keys as necessary

Populate the dimensions with the appropriate list of locations, date ranges, time of day to the right grain and one data source record per file. This table will also allow you to put a cube over the top, or can be flattened with a view, which will help people using tools like Excel or stats packages to get and use the data.