Mysql – How to change an incorrect datatype for a birth date field from VARCHAR to DATE

datatypesdatemysql-5.7varchar

I made a mistake when creating my users table with a birthdate field and instead of putting the DATE datatype, I put a VARCHAR!

So now my users table looks like this:

CREATE TABLE IF NOT EXISTS users 
(
  id INT    UNSIGNED NOT NULL,
  birthdate VARCHAR (200) NOT NULL,

  PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;

and it is populated as follows (sample):

INSERT INTO users (id, birthdate) VALUES
  (1,'1991-01-23'),
  (2,'yyyy-01-23'),
  (3,'1991-mm-23'),
  (4,'1991-01-dd'),
  (5,''),
  (6,'1991-01-d3'),
  (7,'1983-05-23'),
  (8,'1991-0m-23'),
  (9,'19yy-01-23'),
  (10,'y991-01-23');

Now I want to update every incorrect birthdate to NULL, or set a default value like 2020-01-01. see my sqlfiddle here.

Best Answer

I used dbfiddle.uk (see here) for this as opposed to sqlfiddle.com - more servers and it's kept up to date better.

So, what I did was the following (the setup is as per your sqlfiddle):

CREATE TABLE IF NOT EXISTS users (
  id int(6) unsigned NOT NULL,
  birthdate varchar(200) NOT NULL,
  PRIMARY KEY (id)
)DEFAULT CHARSET=utf8;

populate:

INSERT INTO users (id, birthdate) VALUES
  (1,'1991-01-23'),
  (2,'yyyy-01-23'),
  (3,'1991-mm-23'),
  (4,'1991-01-dd'),
  (5,''),
  (6,'1991-01-d3'),
  (7,'1983-05-23'),
  (8,'1991-0m-23'),
  (9,'19yy-01-23'),
  (10,'y991-01-23');

Add a column to hold valid values:

ALTER TABLE users ADD new_bdate DATE;  -- add column to hold valid values

Note that the new field is nullable. It has to be, unless you want to put in some default like 01/01/1900 or '0000-00-00' or '2020-01-01' - I would (strongly) advise against this! It confuses the optimiser when it's figuring out the PLAN and NULL is perfectly valid when your datum isn't known!

You mention in comments that trying to use '0000-00-00' as a default fails. This is because the sql_mode contains STRICT_TRANS_TABLES - this is enabled by default in MySQL 5.7 (see the documentation here) - it is further discussed here and here (and accompanying links & comments). From the MySQL docco:

Strict mode affects whether the server permits '0000-00-00' as a valid date: If strict mode is not enabled, '0000-00-00' is permitted and inserts produce no warning. If strict mode is enabled, '0000-00-00' is not permitted and inserts produce an error, unless IGNORE is given as well. For INSERT IGNORE and UPDATE IGNORE, '0000-00-00' is permitted and inserts produce a warning.

So, now we alter the birthdate field to accept NULLs - this is important later when the "cleaning up" is done:

ALTER TABLE users MODIFY birthdate VARCHAR (200) NULL;  
-- make birthdate nullable
-- this is important for the STR_TO_DATE function.

I also make the original birthdate field nullable. If this isn't done, then the first UPDATE fails in the TRANSACTION below:

START TRANSACTION;
UPDATE users SET  birthdate = NULL
WHERE birthdate REGEXP '[a-zA-Z/]' OR birthdate = '';
UPDATE users SET new_bdate = birthdate WHERE birthdate IS NOT NULL;
COMMIT;

It's important to do both DML steps/queries within the one transaction to avoid any updates between steps - although I imagine that you can do this at a quiet time - or you could lock the tables for the duration of your changes.

  • Explanation of the regular expression:
    [a-z] means match all characters in the range [a, b, c... x, y, z], and the A-Z means the same for capitals. The / character will match the slash - can be used in dates, but is not valid for MySQL dates.

Now, we cleanup;

ALTER TABLE users DROP COLUMN new_bdate;

Finally, we check our results:

SELECT * FROM users;

Result:

id  birthdate
1   1991-01-23
2   NULL
3   NULL
4   NULL
5   NULL
6   NULL
7   1983-05-23
8   NULL
9   NULL
10  NULL
10 rows

So, now we have a column with the correct datatype and the correct values (where these are known).

Your problem demonstrates the general importance of never allowing the entry of free text in an application. If at all possible, users should be obliged to choose from drop-downs and ensure the NOT NULL constraint from the get-go!

Also, it shows the importance of choosing the correct datatype from day one! Your database is your last bastion of defence for your data, so ensure that anything that's entered is valid from the start - you will avoid issues like this one down the line!

Edit:

Following comments by the OP, in particular with respect to a date entered into the VARCHAR field as YYYY/mm/dd (i.e. separator / instead of `-`` - @Akina's (elegant) REGEXP answer can be modified as follows (after adding suitable dates - see fiddle here).

INSERT INTO `users` (`id`, `birthdate`) VALUES 
  (11, '1993/03/20'),
  (12, '2000/09/25');
  (13, '2015.06.30'),
  (14, '2015_04_15');

Note the use of the slash (/), dot (.) or underscore (_) characters as separators for the different date subfields.

SQL:

UPDATE users
SET DOB = 
        STR_TO_DATE
        (
          CONCAT
          (
            SUBSTRING(birthdate, 1, 4),
            '-',
            SUBSTRING(birthdate, 6, 2),
            '-',
            SUBSTRING(birthdate, 9, 2)
          ), '%Y-%m-%d'
        )
WHERE birthdate REGEXP '[0-9]{4}.[0-9]{2}.[0-9]{2}';

Result (snipped for brevity):

id   birthdate  DOB
...
...
10  y991-01-23  NULL    
11  1993/03/20  1993-03-20
12  2000/09/25  2000-09-25
13  2015.06.30  2015-06-30
14  2015_04_15  2015-04-15
14 rows

This is slightly modifed from @Akina's answer - it uses the dot ('.') in the regular expression - the dot being a regular expression metacharacter (or "special" character) which is a wild-card that can stand for any single (i.e. one and only one) character.

So any year followed by any single character followed by any month by any single character by any day will match - this will cover valid ISO dates (using the hyphen (-) or other possible separators , i.e. the underscore or the literal dot character. The date regexp is somewhat simplified - real ones for dates can be much more complex!

I had a really tricky time extracting the date from the string. I used the MySQL (non-standard - quelle surprise!) string concatenation operator (the plus (+)) sign and it started adding (i.e. numerically) the year number and the month number. The same thing happened with the MySQL version of the (standard SQL) double-pipe (||) operator. It was only when I found this that I managed to get to the final working CONCAT solution!

As my last link says, "ya gotta love MySQL" - that was not the first emotion which came to me... Yet another reason why I evangalise for PostgreSQL on this forum!

To set your birthdate field to a default of 2020-01-01 (not advised... see above), use the code in @Akina's comment:

SET DOB = 
  CASE 
    WHEN birthdate REGEXP {pattern 1} 
      THEN {expression 1} 
    WHEN {pattern 2} THEN {expression 2} ... ELSE NULL END

like this:

UPDATE users
SET DOB = 
  CASE 
    WHEN birthdate REGEXP '[0-9]{4}.[0-9]{2}.[0-9]{2}'
      THEN 
        STR_TO_DATE
        (
          CONCAT
          (
            SUBSTRING(birthdate, 1, 4),
            '-',
            SUBSTRING(birthdate, 6, 2),
            '-',
            SUBSTRING(birthdate, 9, 2)
          ), '%Y-%m-%d'
        )   
      ELSE '2020-01-01'
    END;

See here for the fiddle. +1 for an interesting first question which looked (deceptively) easy but which got me thinking - and welcome to the forum!