Postgresql – (PostgreSQL 11) Reading in an array of JSON objects (or composite types) from a CSV file into Postgres

arraycomposite-typescsvjsonpostgresql

So I am trying to import rows into a table using a CSV file, but have been running into problems I haven't been able to solve. I would like one column of a row to be an array of JSON objects.

How I would generally like my table to look like:

statement_id INT statement VARCHAR(100) options JSON[]
1 I am cool [{label: 'Not true', value: 1}, {label: 'Somewhat True', value: 2}, {label: 'Very true', value: 3}]

I am choosing to not create a separate table for the options because there's really no need for me to do so in my application, so I would like to not have to do that if possible.

My definition of the table:

CREATE TABLE statements(
statement_id INT GENERATED BY DEFAULT AS IDENTITY,
statement VARCHAR(100) NOT NULL,
options JSON [],
PRIMARY KEY(statement_id)
);

How I import the CSV file into the table:

copy statements(statement_id, statement, options)
from 'D:\Projects\project\files\statements.csv'
delimiter ','
csv header;

How options is formatted in my CSV:

{("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderately True","value": 3),("label": "Exactly True","value": 4)}

Attempt 1

Using the setup I've mentioned above, this is the error I get:

ERROR:  malformed array literal: "{("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderately True","value": 3),("label": "Exactly True","value": 4)}"
DETAIL:  Unexpected array element.
CONTEXT:  COPY asmt_questions, line 2, column options: "{("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderately T..."

I found this solution, and edited my options format in the CSV to be:

array[("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderately True","value": 3),("label": "Exactly True","value": 4)]::json[]

and got the following error:

ERROR:  malformed array literal: "array[("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderately True","value": 3),("label": "Exactly True","value": 4)]::json[]"
DETAIL:  Array value must start with "{" or dimension information.
CONTEXT:  COPY asmt_questions, line 2, column options: "array[("label": "Not at all True","value": 1),("label": "Hardly True","value": 2),("label": "Moderat..."

Attempt 2

I couldn't find any solutions that were related to this problem, but I did notice that the solution I mentioned above used composite types, so I created a composite type:

CREATE TYPE statement_option AS (
    label VARCHAR(20),
    value INT
);

and changed the statement table schema to be

CREATE TABLE statements(
    statement_id INT GENERATED BY DEFAULT AS IDENTITY,
    statement VARCHAR(100) NOT NULL,
    options statement_option [],
    PRIMARY KEY(statement_id)
    );

I got the same errors listed above.

Attempt n

After trying to search for a solution to these errors, I couldn't find one that was specific to this situation (a lot of them had to do with procedures and functions). So I tried many different combinations of the above setups.

I tried using brackets to represent an array in the CSV (ex: [{},{}]), I changed the [] to ARRAY in the statement table schema, I used single quotes instead of double quotes in the CSV file, etc. All to no avail.

At this point, I feel like I've exhausted the options I can try when it comes to this specific setup. I have a hunch that I might have to just do it line by line, or just create a procedure to read them in.

But I was wondering if there was something that I'm missing, or something that I'm doing wrong?

Best Answer

This works for me:

My CSV

id;json
1;{"{\"label\": \"Not at all True\",\"value\": 1}","{\"label\": \"Hardly True\",\"value\": 2}","{\"label\": \"Moderately True\",\"value\": 3}","{\"label\": \"Exactly True\",\"value\": 4}"}
2;{"{\"label\": \"Not at all True\",\"value\": 1}","{\"label\": \"Hardly True\",\"value\": 2}","{\"label\": \"Moderately True\",\"value\": 3}","{\"label\": \"Exactly True\",\"value\": 4}"}

My Table:

create table test_json_list (id integer, j json[]);

My COPY statement:

COPY test_json_list (id, j) FROM '/tmp/json.csv' WITH (DELIMITER ';', HEADER, FORMAT CSV, QUOTE '|');

Explanation: The array literal in PostgresSQL is a list of elements between curly braces, ex: {1,2,3} or {'a','b','c'}

I didn't find a clear explanation in the docs of the reason why the COPY command wants to have the JSON array elements quoted in double quotes, just took the output of a COPY TO command as reference.

I added a separator that doesn't conflict much with your JSON, but it is not very important.

I added a pipe QUOTE character to avoid one additional level of escaping of double quotes that would otherwise be required as the default CSV quote character is also a double quote.

My suggestion to simplify all this slightly would be to import a JSON array instead of an ARRAY (Postgres array) of JSONs if possible by your application logic.