MariaDB LOAD DATA INFILE – Handling Surname ‘Null’

mariadb

I'm given source data in a tab-delimited text file where the string columns are not quoted.

Among the columns is LAST_NAME, and naturally my source data contains a person with the surname Null. Further, some of the records have redacted names, which I would like to store as NULLs, so I can't define my column as NOT NULL. (A sentinel value is used for the redacted names, so I can identify them.)

I am trying to import this data via LOAD DATA INFILE.

I read each source column into a variable and pass those variables through a function that looks for the sentinel values and translates them to NULL. When this function returns a NULL value, the column gets assigned as NULL. But when this function returns the string 'NULL', the column still gets assigned as NULL, instead of being assigned the string 'NULL'.

How can I prevent the string 'NULL' from being interpreted as a NULL value in the context of a SET clause of a LOAD DATA INFILE command? (Assigning to a nullable varchar column.)

Update with specific data and commands:

An input data file (/tmp/data.txt); note that the fields are TAB-delimited, but I don't know how to do that here:

    SMITH   Other text is
    REDACTED        "inconsistently quoted"
    NULL    REDACTED

My commands:

    CREATE FUNCTION despair(value varchar(255))
      RETURNS varchar(255) DETERMINISTIC
      RETURN
        CASE value
        WHEN  NULL  THEN 'NULL'
        WHEN 'NULL' THEN 'NULL'
        WHEN 'REDACTED' THEN NULL
        ELSE value
        END;

    CREATE TABLE `misery` (LAST_NAME varchar(50), TEXT varchar(100));

    LOAD DATA INFILE '/tmp/data.txt' INTO TABLE `misery`
      FIELDS TERMINATED BY '\t' ENCLOSED BY '"' LINES TERMINATED BY '\n'
      (@name, @text) SET LAST_NAME = despair(@name), TEXT = despair(@text);

    SELECT LAST_NAME, TEXT, LENGTH(LAST_NAME), LENGTH(TEXT) FROM `misery`;

Current output. Note that the TEXT column looks right, because quotation marks are removed as desired, but the last row's LAST_NAME is a NULL value, but should instead be the string 'NULL'.

    +-----------+-----------------------+-------------------+--------------+
    | LAST_NAME | TEXT                  | LENGTH(LAST_NAME) | LENGTH(TEXT) |
    +-----------+-----------------------+-------------------+--------------+
    | SMITH     | Other text is         |                 5 |           13 |
    | NULL      | inconsistently quoted |              NULL |           21 |
    | NULL      | NULL                  |              NULL |         NULL |
    +-----------+-----------------------+-------------------+--------------+
    3 rows in set (0.00 sec)

Best Answer

How can I prevent the string 'NULL' from being interpreted as a NULL value in the context of a SET clause of a LOAD DATA INFILE command? (Assigning to a nullable varchar column.)

Provide a FIELDS ENCLOSED BY from the docs

If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.

Essentially if you set FIELDS ENCLOSED BY '"' then you can provide surname "Null", as "Null", and nulls as null (no quotes).