I'm not too experienced with database problems, so please excuse the wall of text, I believe the context is important to understand the problem and goal.
I receive a CSV file from another system every day that I want to retain in a database running on Microsoft SQL Server. To accomplish this, I have a PowerShell-script scheduled daily that uses the bcp utility to "load" the CSV (accounts.csv) into a table (accounts_stage
) that has the "exact" layout of the CSV file. In addition, I "write" the timestamp into another table (import_stage
). The accounts_stage
table has a trigger set, and bcp is configured to fire that trigger.
After the import is finished, I want to "normalize" the data. For a basic example, assume this concerns only the contract number (in the CSV file a field called "Vnr"), which is a string. I have another table (contracts
), which gets filled by the trigger:
INSERT INTO dzp.contractid(vnr)
SELECT DISTINCT
a.Vnr
FROM dzp.accounts_stage a
WHERE NOT EXISTS(
SELECT id.vnr
FROM dzp.contractid id
)
This should, and seems to accomplish, add all contract numbers that are ever imported to the table contractids
. contractids.contractId
is a BIGINT PRIMARY KEY, IDENTITY, and auto increment.
I add the import-metadata to my table like this:
CREATE TABLE #IMPORTMETADATA
([importid] bigint
,[generated_timestamp] datetime
,[imported_timestamp] datetime
)
INSERT INTO #IMPORTMETADATA
SELECT TOP 1 [importid], [timestamp_import], [generated_timestamp]
FROM
[QGTAA].[dzp].[import]
ORDER BY
[timestamp_import] DESC
Next, the trigger should INSERT the data FROM accounts_stage
INTO accounts
, such that the contractId
is replaced (from the original string representation to the PRIMARY KEY of the contracts
table).
INSERT INTO [dzp].[accounts]
([Udnr]
,[ContractId]
-- snip
,[generated_timestamp]
,[importid]
)
SELECT
a.[Udnr],
cid.contractId,
-- snip
meta.[generated_timestamp],
meta.[importid]
FROM [QGTAA].[dzp].[accounts_stage] a
LEFT JOIN dzp.contractid cid ON cid.vnr = a.Vnr
CROSS JOIN #IMPORTMETADATA meta
However, this last step fails, because the SELECT operation in the last block yields rows where contractId
IS NULL, which I did not expect. Now I believe the issue is somewhere with the LEFT JOIN, but I'm not sure why.
Questions
-
Can someone please point me in the right direction on this specific problem?
-
I am not too experienced with databases, is this even a sane way of accomplishing what I intend, or is this a completely insane approach? Is there a better way?
Thank you!
Best Answer
There's a problem with your first
NOT EXISTS
. Let's take a look at your current query:So, what's happening here is that the
NOT EXISTS
that you are using will return true only when thedzp.contractid
table is empty. Once it has one row (or more), theNOT EXISTS
will returnfalse
for every row of your main table, sinceSELECT id.vnr FROM dzp.contractid
has a result set, and you end up inserting duplicated values in your table.What you need to do is correlate this subquery with the main table:
Also, the
CROSS JOIN
with the data from the latest import doesn't make much sense to me, and neither the fact that you aren't checking if the data exists already in theaccounts
table.