I have various offices and one central head office. Each office has its own SQL Server instance so each office has its own data set with its own set of IDs.
Each office has already imported data into the head office and stored the data on a set of STAGING_Tables
that look like this.
DECLARE @STAGING_COUNTRY TABLE
(
Original_CountryID INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
Data VARCHAR(200) NOT NULL
);
DECLARE @STAGING_CITY TABLE
(
Original_CityID INT NOT NULL,
Original_CountryID_FK INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
OtherData VARCHAR(100) NOT NULL
);
STAGING_COUNTRY
has the original ID of each row (which off course will be duplicated since each office will have ID=1
for the 1st row on their Country
table) and also has a unique OfficeID
value that together with the Original_CountryID
..makes a unique value.
STAGING_CITY
has also the original ID
of each row, the unique OfficeID
value that represent each office and in this case a FK to CountryID
, (but of course at this point we have a reference to the Original_CountryID
..that in conjunction with the office ID
could be identified).
Let's add some dummy rows:
/* ADD DUMMY VALUES TO STAGING_COUNTRY */
INSERT INTO @STAGING_COUNTRY
(Original_CountryID, OfficeID, Data) VALUES (1, 'Office1', 'USA')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (2, 'Office1', 'Canada')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office1', 'Japan')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (1, 'Office2', 'USA')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (1, 'Office2', 'Italy')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office2', 'Canada')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office3', 'Canada')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (2, 'Office3', 'France')
INSERT INTO @STAGING_COUNTRY (Original_CountryID, OfficeID, Data)
VALUES (3, 'Office3', 'USA')
/* ADD DUMMY VALUES TO STAGING_CITY */
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK, OfficeID, OtherData) VALUES
(1, 1, 'Office1', 'New York')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 1, 'Office1', 'Vancouver')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 1, 'Office1', 'Tokia')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (1, 2, 'Office2', 'New York')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 2, 'Office2', 'Rome')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 2, 'Office2', 'Vancouver')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (1, 3, 'Office3', 'Vancouver')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (2, 3, 'Office3', 'Paris')
INSERT INTO @STAGING_CITY (Original_CityID, Original_CountryID_FK,
OfficeID, OtherData) VALUES (3, 3, 'Office3', 'New York')
The central head office wants to run reports from a central dtabase that pretty much contains copy all the data from all offices but in order to make this reporting DB optimized, we need to reshuffle a bit the STAGING_Tables
…and reorganize the data in FINAL_Tables
that look like this:
DECLARE @FINAL_COUNTRY TABLE
(
CountryID INT IDENTITY PRIMARY KEY,
Original_CountryID INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
Data VARCHAR(200) NOT NULL
);
DECLARE @FINAL_CITY TABLE
(
CityID INT IDENTITY PRIMARY KEY,
Original_CityID INT NOT NULL,
CountryID_FK INT NOT NULL,
OfficeID VARCHAR(10) NOT NULL,
OtherData VARCHAR(100) NOT NULL
);
PROBLEM:
The FINAL_COUNTRY
and FINAL_CITY
tables should be as optimized as possible for reporting purposes. These reports will be written in T-SQL stored procedures.
QUESTION:
What is the best way to reorganize the FINAL_Tables
so that each record has a TRUE PK identifier (like in the original Office_Tables
) and each FK is updated to point to the right newly created PK …at the server level?
NOTE:
Please note that both staging & final tables are inside the same DB, on the server.
Also we still need to keep the OriginalID
s on the FINAL_Tables
for other purposes.
GOALS:
The main goal here is to reorganize into a set of tables that can be easily indexed for performance purposes.
Please ask more info if needed.
Many many thanks in advanced…
Best Answer
The
IDENTITY
key in yourFINAL_COUNTRIES
table is just fine for a PK. You've created a "synthetic key." I would also add a UQ on { OriginalCountryID, OfficeID }, to speed joins from staged data.Another option would be to have a
Countries
table in your warehouse with only unique values (e.g., "USA" just once) and then aCountriesMap
table relating OfficeID and CountryID to the ID in the warehouse table. This would help performance, as when you get to your fact tables you'll want to use a singleINT
to relate to the Countries table, rather than anINT
and aVARCHAR()
- fact tables can get big fast, and it's important to keep them as narrow as you can.