I would suggest you to make use of External Tables. You can create an external table on your CSV file using ORACLE_LOADER
driver and then update your existing table with data in your external table using DML (MERGE
for example).
Consult Oracle Utilities Guide for detailed info.
What follows is my sample of how you can update tables from flat files.
First you create the directory where your flat file will be located:
SQL> create directory ext_tab_dir as '/home/oracle/ora_load';
Directory created.
SQL> grant read, write on directory ext_tab_dir to spongebob;
Grant succeeded.
Then you create your external table:
SQL> CREATE TABLE emp_load (empid number(3),
first_name CHAR(15),
last_name CHAR(20),
year_of_birth CHAR(4))
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir
ACCESS PARAMETERS (RECORDS FIXED 21 FIELDS (
empid char(1),
first_name CHAR(7),
last_name CHAR(8),
year_of_birth CHAR(4)))
LOCATION ('info.dat')
);
We'll assume that you already have the table that you want to update with data from the flat file:
SQL> CREATE TABLE emp (empid number(3),
first_name CHAR(15),
last_name CHAR(20),
year_of_birth CHAR(4));
SQL> insert into emp values(1, 'SpongeBob', 'SquarePants', '1997');
SQL> insert into emp values(2, 'Patrick', 'Star', '1997');
SQL> insert into emp values(3, 'Squidward', 'Tentacles', '1997');
(Here, I just loaded it with sample data.)
Now that's what your info.dat
contains, for example:
[oracle@oca ~]$ cat ora_load/info.dat
1Alvin Tolliver1976
2KennethBaer 1963
3Mary Dube 1973
4NandiniShastri 1985
At this point you will just update your existing table with above data from the external table:
SQL> merge into emp e1
using emp_load e2
on (e1.empid = e2.empid)
when matched then update set e1.first_name = e2.first_name,
e1.last_name = e2.last_name
when not matched then insert
values (e2.empid, e2.first_name, e2.last_name, e2.year_of_birth);
Here's what you'll get as a result:
SQL> select * from emp;
EMPID FIRST_NAME LAST_NAME YEAR
---------- --------------- -------------------- ----
1 Alvin Tolliver 1997
2 Kenneth Baer 1997
3 Mary Dube 1997
4 Nandini Shastri 1985
There is no syntax variant that lets you update the whole row at once. However, there is a shorter form than what you have so far.
Also, you do not actually want to update all columns. The WHERE
condition on id pins down at least one column (id
) to remain unchanged. But that's just nitpicking.
UPDATE table_a a
SET ( c1, c2, ...)
= (b.c1, b.c2, ...)
FROM table_b b
WHERE a.id = b.id;
More details in this related answer:
Bulk update of all columns
DELETE / INSERT
Internally, due to the MVCC model of Postgres, every UPDATE
effectively inserts a new row anyway and marks the old one as obsolete. So, behind the curtains there is not much difference between UPDATE
and DELETE
plus INSERT
.
There are some details in favor of the UPDATE
route:
- HOT UPDATE.
- TOAST tables: If you have large columns, the content may be stored."out-of-line" in TOAST tables and the new row version can link to the same row in the TOAST table if toasted columns remain unchanged.
- Index maintenance may be cheaper for updates.
Otherwise, locking should be about the same. You need an exclusive lock on affected rows either way. Just make it quick.
If you are dealing with a huge number of rows and you don't need a consistent state (all rows or none), you can split the operation into multiple batches. (Separate transactions!) Increases the total cost, but keeps the lock time per row shorter.
Best Answer
There have been some solid answers to this post, but just as an added extra, if you know this is going to be a regular task and that the file is going to be dropped in a specific location each week with the same format and name, it is possible to query flat files directly such as csv, txt and so on via SQL using OPENROWSET feature and then operate on the data using inserts and updates. (Bulk Insert is also an option and can be simpler, but in this case I opted for OpenRowSet for the benefit of easy updates.)
This requires a little set up, but once you have it working, you can create a view to read your CSV with a fixed output or even use a stored procedure to update your table from then on, so you can automate your throughput, you can even join your CSV to existing data, although it won't be indexed, so you may want to consider using the feature to stage data into a table, index that table and then perform your update.
Although, if you are going to do a proper throughput you should look into SSIS, as you can set properly piped errors in the throughput and output locations for erroneous inputs.
I am going to make some assumptions with your data.
So SQL has a weakness that more lax systems do not require, but that weakness enables all the strengths it has. It needs definitions for the columns. Definitions that aren't contained explicitly in the CSV file. So you need to create the definitions in an XML file and point SQL to them, and you can do that in a definitional XML, in this case using BCPFORMAT.
The definitional XML file would be called: BCP_FileName.XML
The CSV would be called Load.CSV
The eventual select statement might be.
This would enable you to put this into a view with the create view statement
With that view you can do...
This has the benefit of being able to query the view so you can sense check the data and all sorts.
Of course, if you are regularly staging important data, you may want to opt for SSIS, as the above means of importing data can be all or nothing, but if that serves your purpose, then it may do.
One thing that you should get is Notepad++ so you can accurately check your hidden characters for CRLF, because some CSV files use LF on it's own or CR.
NOTE: You must give the SQL instance access to the file location if you intend to use that as a regular loading process. To do this, go into your sever, select services, check which user is running that process (Make sure it is a domain authenticated account if the file location is not on the same server, so you can utilise the domain user to access that location for the file pickup.)
NOTE 2: You should be aware, that this process is going to update things, so it is entirely possible for a malicious attack to really do some damage, so make sure that you restrict your input process to trusted users.
Here is a resource for the XSI types: https://docs.microsoft.com/en-us/openspecs/sql_data_portability/ms-bcp/51298f0a-c9ac-463a-8e01-76d25ebaca3c
Here is the OpenRowSet overview. https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql?view=sql-server-ver15