I'd recommend using the COPY
command from psql
. You can set a DEFAULT
value for a column and omit that value from the COPY
command, eg:
\copy tablename(col1,col2,col3) FROM 'thefile.csv' WITH (FORMAT CSV)
Alternately, you can create a new TEMPORARY
table in PgAdmin-III wtih just the columns in the CSV, import the CSV into it, and use SQL like this to merge it into the main table:
INSERT INTO realtable (col1, col2, col3, colwithdefault)
SELECT
col1, col2, col, 'some default value'
FROM tempcsvtable;
You can use this to calculate columns based on expressions, combine and split columns, omit some rows, etc.
According to the documentation, you can use SET
statements to transform the data on the way in.
[SET col_name = expr,...]
The expr expression can include the column name, which will be interpreted as the data being read from the file and destined for that column... so, for example, at the end of your LOAD DATA INFILE
statement you might use:
SET latitude = IF(latitude + 0 = 0,NULL,latitude),
area_code = IF(area_code = '',NULL,area_code)
This example transforms 2 columns. If latitude + 0 is 0, latitude gets set to NULL
, and otherwise it gets set to the value from the file as the data is inserted; if area_code contains an empty string, it gets set to NULL
, otherwise to the data from the file. The more appropriate choice will depend on how MySQL handles casting the data, but I suspect either of these constructs would work in your situation.
You do not have to reference columns you don't intend to transform. They'll be inserted as-is.
Best Answer
If the records don't have a embedded newlines in text fields, so that there is a strict [one line = one record] mapping, you may pass the output of
\copy csv
in psql to the Unix command split. For instance:See the options of
split
to change the format of the names of output files or the destination directory. It can also be used server-side withCOPY
instead of\copy
if you're superuser.If the records may have embedded newlines, it's more complicated because with the above method, a record might span two consecutive files, making each file an invalid CSV file in isolation. For instance:
would produce two files
and
If the goal is to concatenate the files back into a single file to process it, then it doesn't matter, but if they must be processed individually, a different method should be considered.
It's possible in psql, but a bit involved (as opposed to writing it in a programming language). As of PostgreSQL 12, csv is a native output format in psql so a cursor on the query might be used with
FETCH 1000000
statements doing the actual cut-and-retrieve. The skeleton for a piece of script that should work would look like this:Because there is no looping construct in psql and assuming you don't know in advance how many fetch steps are needed, you have to generate that piece of script in a previous step that computes the
count(*)
of the resultset and emits(count(*)+NR-1)/NR
timesfetch NR from...
commands to numbered files like above, whereNR
is your number of records per file.