We are expanding our data warehouse to include a new source system. In the new system, we want a single column for a primary key, but in the old system it's two columns.
For exmaple:
create table #new_case (case_id varchar(255))
create table #old_case (patient_id int, care_event_counter int)
insert into #old_case (patient_id, Care_event_counter)
values
(55, 1),
(55, 2),
(55, 3),
(56, 1),
(57, 1),
(57, 2)
insert into #new_case (case_id)
select cast(patient_id as varchar(55)) + '_' + cast(care_event_counter as varchar(55))
from #old_case
select * from #new_case
problem: This has got to be super slow on a large scale, and I'd prefer to use int across the board.
Question: How do you merge two integer columns intelligently? is there a math trick you could use to derive the values that would be faster?
Best Answer
It's quite common in Data Warehouse schema designs to assign a new, integer, single-column key to every table, and retain the source-system key (sometimes called the "business key") as well.
Kimball says:
Natural, Durable, and Supernatural Keys
So something like:
You need to retain the other key and have it indexed because with new data comes in with the business key, you need to associate it with the new warehouse ID.