Sql-server – Tsql help – doing a large update from CSV file to a table in ssms

csvsql serversql-server-2016t-sqlupdate

I have been given a csv file with just ids in there and those ids are in a table in ssms with columns that require updating. Out of 90,000 ids in that table, only 83,000 of them require updating from the csv file.

the main table is:

ID  ENABLED    AGREED
1       1                       no  
2       1                       no
3       1                       no
4       0                       yes

I know I can do an update on the table from the ids i have been given in the csv such as:

update table1
set enabled = 0,
agreed = 'yes',
where id in('1','2','3')

however, my problem is, there are 83,000 of them that require updating to those specific conditions and wanted to find the best way to do this. I have been told to write a script to import the CSV into a temp table, then join that temp table onto the main table and perform the update rather than hard code a list of id's in a script.

How can I do this? Does anyone know a better way?

Best Answer

There have been some solid answers to this post, but just as an added extra, if you know this is going to be a regular task and that the file is going to be dropped in a specific location each week with the same format and name, it is possible to query flat files directly such as csv, txt and so on via SQL using OPENROWSET feature and then operate on the data using inserts and updates. (Bulk Insert is also an option and can be simpler, but in this case I opted for OpenRowSet for the benefit of easy updates.)

This requires a little set up, but once you have it working, you can create a view to read your CSV with a fixed output or even use a stored procedure to update your table from then on, so you can automate your throughput, you can even join your CSV to existing data, although it won't be indexed, so you may want to consider using the feature to stage data into a table, index that table and then perform your update.

Although, if you are going to do a proper throughput you should look into SSIS, as you can set properly piped errors in the throughput and output locations for erroneous inputs.

I am going to make some assumptions with your data.

  • You're using tab spaces to separate them
  • You're using CRLF as a carriage return.
  • I am assuming you are using INT instead of BIT for the second column.

So SQL has a weakness that more lax systems do not require, but that weakness enables all the strengths it has. It needs definitions for the columns. Definitions that aren't contained explicitly in the CSV file. So you need to create the definitions in an XML file and point SQL to them, and you can do that in a definitional XML, in this case using BCPFORMAT.

The definitional XML file would be called: BCP_FileName.XML

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='\t' />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='\t' />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='\r\n' />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT" />
  <COLUMN SOURCE="2" NAME="Enabled" xsi:type="SQLINT" />
  <COLUMN SOURCE="3" NAME="Agreed" xsi:type="SQLVARYCHAR" />
 </ROW>
</BCPFORMAT>

The CSV would be called Load.CSV

The eventual select statement might be.

  SELECT ID,Enabled,Agreed
  FROM OPENROWSET(BULK 'C:\Location\Load.CSV',
  FIRSTROW = 2,
  FORMATFILE='C:\Location\BCP_FileName.XML'     
  ) as t1

This would enable you to put this into a view with the create view statement

Create View dbo.CSVQuery
as
SELECT ID,Enabled,Agreed
  FROM OPENROWSET(BULK 'C:\Location\Load.CSV',
  FIRSTROW = 2,
  FORMATFILE='C:\Location\BCP_FileName.XML'     
  ) as t1

With that view you can do...

UPDATE st
SET Enabled = cq.Enabled,
agreed = cq.Agreed
FROM SourceTable st
INNER JOIN dbo.CSVQuery cq ON cq.ID = st.ID

This has the benefit of being able to query the view so you can sense check the data and all sorts.

Of course, if you are regularly staging important data, you may want to opt for SSIS, as the above means of importing data can be all or nothing, but if that serves your purpose, then it may do.

One thing that you should get is Notepad++ so you can accurately check your hidden characters for CRLF, because some CSV files use LF on it's own or CR.

NOTE: You must give the SQL instance access to the file location if you intend to use that as a regular loading process. To do this, go into your sever, select services, check which user is running that process (Make sure it is a domain authenticated account if the file location is not on the same server, so you can utilise the domain user to access that location for the file pickup.)

NOTE 2: You should be aware, that this process is going to update things, so it is entirely possible for a malicious attack to really do some damage, so make sure that you restrict your input process to trusted users.

Here is a resource for the XSI types: https://docs.microsoft.com/en-us/openspecs/sql_data_portability/ms-bcp/51298f0a-c9ac-463a-8e01-76d25ebaca3c

Here is the OpenRowSet overview. https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql?view=sql-server-ver15