Sql-server – Tsql help – doing a large update from CSV file to a table in ssms

csvsql serversql-server-2016t-sqlupdate

I have been given a csv file with just ids in there and those ids are in a table in ssms with columns that require updating. Out of 90,000 ids in that table, only 83,000 of them require updating from the csv file.

the main table is:

ID  ENABLED    AGREED
1       1                       no  
2       1                       no
3       1                       no
4       0                       yes

I know I can do an update on the table from the ids i have been given in the csv such as:

update table1
set enabled = 0,
agreed = 'yes',
where id in('1','2','3')

however, my problem is, there are 83,000 of them that require updating to those specific conditions and wanted to find the best way to do this. I have been told to write a script to import the CSV into a temp table, then join that temp table onto the main table and perform the update rather than hard code a list of id's in a script.

How can I do this? Does anyone know a better way?

Best Answer

There have been some solid answers to this post, but just as an added extra, if you know this is going to be a regular task and that the file is going to be dropped in a specific location each week with the same format and name, it is possible to query flat files directly such as csv, txt and so on via SQL using OPENROWSET feature and then operate on the data using inserts and updates. (Bulk Insert is also an option and can be simpler, but in this case I opted for OpenRowSet for the benefit of easy updates.)

This requires a little set up, but once you have it working, you can create a view to read your CSV with a fixed output or even use a stored procedure to update your table from then on, so you can automate your throughput, you can even join your CSV to existing data, although it won't be indexed, so you may want to consider using the feature to stage data into a table, index that table and then perform your update.

Although, if you are going to do a proper throughput you should look into SSIS, as you can set properly piped errors in the throughput and output locations for erroneous inputs.

I am going to make some assumptions with your data.

You're using tab spaces to separate them
You're using CRLF as a carriage return.
I am assuming you are using INT instead of BIT for the second column.

So SQL has a weakness that more lax systems do not require, but that weakness enables all the strengths it has. It needs definitions for the columns. Definitions that aren't contained explicitly in the CSV file. So you need to create the definitions in an XML file and point SQL to them, and you can do that in a definitional XML, in this case using BCPFORMAT.

The definitional XML file would be called: BCP_FileName.XML

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='\t' />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='\t' />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='\r\n' />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="ID" xsi:type="SQLINT" />
  <COLUMN SOURCE="2" NAME="Enabled" xsi:type="SQLINT" />
  <COLUMN SOURCE="3" NAME="Agreed" xsi:type="SQLVARYCHAR" />
 </ROW>
</BCPFORMAT>

The CSV would be called Load.CSV

The eventual select statement might be.

  SELECT ID,Enabled,Agreed
  FROM OPENROWSET(BULK 'C:\Location\Load.CSV',
  FIRSTROW = 2,
  FORMATFILE='C:\Location\BCP_FileName.XML'     
  ) as t1

This would enable you to put this into a view with the create view statement

Create View dbo.CSVQuery
as
SELECT ID,Enabled,Agreed
  FROM OPENROWSET(BULK 'C:\Location\Load.CSV',
  FIRSTROW = 2,
  FORMATFILE='C:\Location\BCP_FileName.XML'     
  ) as t1

With that view you can do...

UPDATE st
SET Enabled = cq.Enabled,
agreed = cq.Agreed
FROM SourceTable st
INNER JOIN dbo.CSVQuery cq ON cq.ID = st.ID

This has the benefit of being able to query the view so you can sense check the data and all sorts.

Of course, if you are regularly staging important data, you may want to opt for SSIS, as the above means of importing data can be all or nothing, but if that serves your purpose, then it may do.

One thing that you should get is Notepad++ so you can accurately check your hidden characters for CRLF, because some CSV files use LF on it's own or CR.

NOTE: You must give the SQL instance access to the file location if you intend to use that as a regular loading process. To do this, go into your sever, select services, check which user is running that process (Make sure it is a domain authenticated account if the file location is not on the same server, so you can utilise the domain user to access that location for the file pickup.)

NOTE 2: You should be aware, that this process is going to update things, so it is entirely possible for a malicious attack to really do some damage, so make sure that you restrict your input process to trusted users.

Here is a resource for the XSI types: https://docs.microsoft.com/en-us/openspecs/sql_data_portability/ms-bcp/51298f0a-c9ac-463a-8e01-76d25ebaca3c

Here is the OpenRowSet overview. https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql?view=sql-server-ver15

Related Solutions

Update oracle sql database from CSV

I would suggest you to make use of External Tables. You can create an external table on your CSV file using ORACLE_LOADER driver and then update your existing table with data in your external table using DML (MERGE for example).

Consult Oracle Utilities Guide for detailed info.

What follows is my sample of how you can update tables from flat files.

First you create the directory where your flat file will be located:

SQL> create directory ext_tab_dir as '/home/oracle/ora_load';
Directory created.

SQL> grant read, write on directory ext_tab_dir to spongebob;
Grant succeeded.

Then you create your external table:

SQL> CREATE TABLE emp_load (empid          number(3), 
                            first_name     CHAR(15), 
                            last_name      CHAR(20), 
                            year_of_birth  CHAR(4))
       ORGANIZATION EXTERNAL (
         TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir
         ACCESS PARAMETERS (RECORDS FIXED 21 FIELDS (
           empid          char(1),
           first_name     CHAR(7),
           last_name      CHAR(8),
           year_of_birth  CHAR(4)))
         LOCATION ('info.dat')
       );

We'll assume that you already have the table that you want to update with data from the flat file:

SQL> CREATE TABLE emp (empid         number(3), 
                      first_name     CHAR(15), 
                      last_name      CHAR(20), 
                      year_of_birth  CHAR(4));
SQL> insert into emp values(1, 'SpongeBob', 'SquarePants', '1997');
SQL> insert into emp values(2, 'Patrick',   'Star',        '1997');
SQL> insert into emp values(3, 'Squidward', 'Tentacles',   '1997');

(Here, I just loaded it with sample data.)

Now that's what your info.dat contains, for example:

[oracle@oca ~]$ cat ora_load/info.dat
1Alvin  Tolliver1976
2KennethBaer    1963
3Mary   Dube    1973
4NandiniShastri 1985

At this point you will just update your existing table with above data from the external table:

SQL> merge into emp e1
       using emp_load e2
       on (e1.empid = e2.empid)
       when matched then update set e1.first_name    = e2.first_name,
                                    e1.last_name     = e2.last_name
       when not matched then insert
         values (e2.empid, e2.first_name, e2.last_name, e2.year_of_birth);

Here's what you'll get as a result:

SQL> select * from emp;
     EMPID FIRST_NAME      LAST_NAME            YEAR
---------- --------------- -------------------- ----
         1 Alvin           Tolliver             1997
         2 Kenneth         Baer                 1997
         3 Mary            Dube                 1997
         4 Nandini         Shastri              1985

Postgresql – Update all columns from another table

There is no syntax variant that lets you update the whole row at once. However, there is a shorter form than what you have so far.

Also, you do not actually want to update all columns. The WHERE condition on id pins down at least one column (id) to remain unchanged. But that's just nitpicking.

UPDATE table_a a
SET    (  c1,   c2, ...)
     = (b.c1, b.c2, ...)
FROM   table_b b
WHERE  a.id = b.id;

More details in this related answer:
Bulk update of all columns

`DELETE / INSERT`

Internally, due to the MVCC model of Postgres, every UPDATE effectively inserts a new row anyway and marks the old one as obsolete. So, behind the curtains there is not much difference between UPDATE and DELETE plus INSERT.
There are some details in favor of the UPDATE route:

HOT UPDATE.
TOAST tables: If you have large columns, the content may be stored."out-of-line" in TOAST tables and the new row version can link to the same row in the TOAST table if toasted columns remain unchanged.
Index maintenance may be cheaper for updates.

Otherwise, locking should be about the same. You need an exclusive lock on affected rows either way. Just make it quick.
If you are dealing with a huge number of rows and you don't need a consistent state (all rows or none), you can split the operation into multiple batches. (Separate transactions!) Increases the total cost, but keeps the lock time per row shorter.

Best Answer

Related Solutions

Update oracle sql database from CSV

Postgresql – Update all columns from another table

DELETE / INSERT

Related Question

`DELETE / INSERT`