Postgresql – Unicode error with \u0000 on COPY of large JSON file into Postgres

jsonpostgresql-9.4regexunicode

I'm using Postgres 9.4 on Ubuntu 14.04.

I have a 30GB JSON file that I'm trying to COPY into Postgres.

But I keep getting the following error:

COPY <table>(comment_jsonb) FROM '<json file>' WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01');
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.

I tried searching for the offending Unicode string to remove it, but because it's \u0000, which is NULL, I can't seem to get a regex to work. And when I print it into console, I have no idea where the NULL is, because (at least my guess is) it seems to print as nothing (though I have no idea what it is actually doing).

Is there a way to skip these errors?

Alternatively, how could I replace any instances of that Unicode in my JSON file?

Best Answer

In a (perhaps) similar case I had been confronted with \u0000 in a string I needed to process as JSON. The follow replace worked for me:

regexp_replace(stringWithNull, '\\u0000', '', 'g')

Hope this helps.