Sql-server – What’s the best way to concatenate a multi-column foreign key

data-warehousesql serversql-server-2016

We are expanding our data warehouse to include a new source system. In the new system, we want a single column for a primary key, but in the old system it's two columns.

For exmaple:

create table #new_case (case_id varchar(255))

create table #old_case (patient_id int, care_event_counter int)

insert into #old_case (patient_id, Care_event_counter)
values 
(55, 1),
(55, 2),
(55, 3),
(56, 1),
(57, 1),
(57, 2)

insert into #new_case (case_id)
select cast(patient_id as varchar(55)) + '_' + cast(care_event_counter as varchar(55))
from #old_case

select * from #new_case

problem: This has got to be super slow on a large scale, and I'd prefer to use int across the board.

Question: How do you merge two integer columns intelligently? is there a math trick you could use to derive the values that would be faster?

Best Answer

It's quite common in Data Warehouse schema designs to assign a new, integer, single-column key to every table, and retain the source-system key (sometimes called the "business key") as well.

Kimball says:

Natural keys created by operational source systems are subject to business rules outside the control of the DW/BI system. For instance, an employee number (natural key) may be changed if the employee resigns and then is rehired. When the data warehouse wants to have a single key for that employee, a new durable key must be created that is persistent and does not change in this situation. This key is sometimes referred to as a durable supernatural key. The best durable keys have a format that is independent of the original business process and thus should be simple integers assigned in sequence beginning with 1. While multiple surrogate keys may be associated with an employee over time as their proﬁle changes, the durable key never changes.

Natural, Durable, and Supernatural Keys

So something like:

create table new_case 
(
  case_id int identity primary key,
  patient_id int not null,
  care_event_counter int not null,
  constraint uk_case unique(patient_id,care_event_counter),
  ... 
)

You need to retain the other key and have it indexed because with new data comes in with the business key, you need to associate it with the new warehouse ID.

Best Answer

Related Solutions

Sql-server – How to export a single huge table from SQL Server

Sql-server – What Happens When Identity Range IsExceeded

Related Question