I am importing flat files from different sources into tables in SQL Server. I am creating a composite primary key using a combination of fields from the extracts that will give me a unique key for every row.
The way I do it now is I just start with 1 field and then I keep concatenating fields until I find a key that is unique for all records. This can be a little time consuming or I might end up concatenating more columns than I really needed to in order to obtain the unique key.
Is there some sort of SQL script I can run on a table that will give me the minimum number of fields (names) I would need to concatenate in order to obtain a unique key? So if there is 1 field in the table that is unique for all records then that 1 field name would be returned. If I needed to concatenate [memberid], [claimid], and [date of service] in order to obtain a unique key then those 3 field names would be the result of the script.
Best Answer
Um, this is not exactly what a Primary Key is for. Yes, they do uniquely identify each row, but they also are the basis of supporting relationships to sibling and child tables.
Not outside of what you are already doing, although maybe in slightly different forms, such as possibly loading the data into a table with no keys or unique indexes or unique constraints defined, and then trying to create the PK or Unique (Index | Constraint) on various combinations of fields. In either case you probably shouldn't be doing this in the first place.
There are a few problems with this approach in general:
FieldA
could be unique, andFieldD
+FieldH
could be unique. What then?0x02FB4C97
? Is that aVARBINARY
or a string of hex bytes? What about123456
? Is that anINT
,BIGINT
,VARCHAR
,DATETIME
(in Julian format),VARBINARY
(without the leading0x
but noA
-F
to help decide)?NULL
for the particular export you are looking at? Or what about max lengths for variable length columns? What about a "comments" field that they are currently just using for a 5 digit number, but later start using it for actual comments?So this comes down to: what is the actual goal of defining the PKs to begin with? What are you trying to accomplish by doing this? Is there a reason why you don't just add an
IDENTITY
field and then remove duplicates within all of the imported fields (all minus theIDENTITY
field)?You really need to find out more about the true nature of the data first, and then build a table to hold the data with keys and constraints that match how the data should exist, not necessarily how it does exist.