Sql-server – Doing a comparison on two csv files with no Primary Key

csvmigrationsql serverssis

I'm wondering if in SSIS there is a way to compare two csv files(with the same structure) with no reference key to make a record unique.

To give you a better picture from what I mean; In initial load we load data from a csv file into a table, in the subsequent loads, we will get another csv file (same format) that can have different records. We should compare what we have in the table vs. what exist in the new version of the csv file and only load the changed sub set. If there is a new records, it should be inserted, if update, it should be updated, if record deleted in the source, it should be marked as inactive in the table.

Now my questions are:

  • Is there any way to compare two csv files in SSIS, while there is no key to make the
    records unique?
  • How can we compare two tables in SQL Server, while there is no key to make
    records unique?

The volume of data in the csv file is quite high, more than 20 millions records!

Any idea is appreciated.

Thank you,

Nazila

Best Answer

If you have your first CSV loaded into a table, you can just as easily load the other one into a staging table (presumably with the same structure as the 'real' one). Then you can get the new rows by

SELECT * FROM staging_table
EXCEPT
SELECT * FROM real_table
;

Rows missing from the new CSV can be get reversing the two sides around EXCEPT. However, given the lack of a key on the staging table (and hopefully not on the real one - it's not clear from your question), deleting rows based on this comparison can be painful, especially with so many rows.

You can drop the staging table once you have finished.

(As far as I see, this approach will work in any RDBMS.)