Mysql – How to extract certificate attributes from a single column and split into many

MySQLregexsubstring

In my table I've a field (let say cert_attr) which stores Certificate X.509 Attributes.

Here is the example of 3 rows (each line corresponds to a field):

"CN=User1, OU=Eng, O=Company Ltd, L=D4, S=Dublin, C=IE"
"CN=User2, OU=Eng, O=Company Ltd, L=D2, S=Dublin, C=IE"
"OU=Eng, O=Company Ltd"

And I'm trying to split the value of a field into separate columns using SELECT in the following way:

SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "CN=", -1), ", ", 1) as CN,
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "OU=", -1), ", ", 1) as OU,
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "O=", -1), ", ", 1) as O,
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "L=", -1), ", ", 1) as L,
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "ST=", -1), ", ", 1) as ST,
SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "C=", -1), ", ", 1) as C
FROM mytable

which works, however there is an issue for the rows which are missing some attributes.

So in the case where the attribute is missing in the field's string, I expect the column to be empty, but it returns the whole string instead.

The first two row examples are working as expected, which returns the following columns correctly:

| CN    | OU  | O           | L  | S.     | C  |
| ----- | --- | ----------- | -- | ------ | -- |
| User1 | Eng | Company Ltd | D4 | Dublin | IE |
| User2 | Eng | Company Ltd | D2 | Dublin | IE |

The problem is with the 3rd row example, which I expect to return an empty string when the substring pattern is not found:

| CN         | OU  | O           | L          | S.         | C          |
| ---------- | --- | ----------- | ---------- | ---------- | ---------- |
| OU=Eng,... | Eng | Company Ltd | OU=Eng,... | OU=Eng,... | OU=Eng,... |

but instead the whole string is returned.

Question:

Is there any way to return an empty string when SUBSTRING_INDEX() fails to find the substring? Or maybe there is some other function (like a regular expression) or another workaround?

My goal is to extract the data into TSV file by having these attributes in separate columns with valid values:

mysql mytable < query.sql > cert_attributes.tsv

Best Answer

The first part is only to show you that the procedure works

CREATE tABLE tablecertificate  (
stationery_name varchar(500) 
INSERT INTO tablecertificate values 
("CN=User1, OU=Eng, O=Company Ltd, L=D4, S=Dublin, C=IE"),
("CN=User2, OU=Eng, O=Company Ltd, L=D2, S=Dublin, C=IE"),
("OU=Eng, O=Company Ltd")
CREATE PROCEDURE `splitcertifcate`()
BEGIN
 DECLARE current_pos INT DEFAULT 1;
 DECLARE delim CHAR DEFAULT ',';
 DECLARE current VARCHAR(100) DEFAULT '';
 DECLARE CN VARCHAR(100) DEFAULT '';
 DECLARE OU VARCHAR(100) DEFAULT '';
 DECLARE O VARCHAR(100) DEFAULT '';
 DECLARE L VARCHAR(100) DEFAULT '';
 DECLARE S VARCHAR(100) DEFAULT '';
 DECLARE C VARCHAR(100) DEFAULT '';
 DECLARE rest_cert_part VARCHAR(100) DEFAULT '';
 DECLARE current_cert_part  VARCHAR(100) DEFAULT '';
  DECLARE finished INTEGER DEFAULT 0;
  DECLARE certificate varchar(500) DEFAULT "";
  DEClARE curcertificate 
      CURSOR FOR 
          SELECT stationery_name FROM tablecertificate;

  -- declare NOT FOUND handler
  DECLARE CONTINUE HANDLER 
        FOR NOT FOUND SET finished = 1;
        
        
  #Temporary Table that holds the splittet part
   DROP TEMPORARY TABLE IF EXISTS mycertificate;
  CREATE TEMPORARY TABLE mycertificate(CN VARCHAR(100), OU VARCHAR(100), O VARCHAR(100) , L  VARCHAR(100), S VARCHAR(100), C VARCHAR(100));

  OPEN curcertificate;
    getcertificate: LOOP
      # get first row elemnt
      FETCH curcertificate INTO certificate;
      IF finished = 1 THEN 
          #Last element reached
          LEAVE getcertificate;
      END IF;
        SET CN = '';
        SET OU = '';
        SET O = '';
        SET L = '';
        SET S = '';
        SET C = '';
      SET current_pos =  LOCATE(delim,certificate);
        SET current_cert_part = SUBSTRING(certificate,1,current_pos-1);
      SET rest_cert_part = SUBSTRING(certificate from current_pos+1);
        IF length(trim(current_cert_part)) = 0   THEN
          SET current_cert_part = rest_cert_part;
      END IF;
      #Examine first element of the string
        CASE 
          WHEN INSTR(current_cert_part, "CN=")  > 0 THEN SET CN = TRIM(current_cert_part);
          WHEN INSTR(current_cert_part, "OU=")  > 0 THEN SET OU = TRIM(current_cert_part);
          WHEN INSTR(current_cert_part, "O=")  > 0 THEN SET O = TRIM(current_cert_part);
          WHEN INSTR(current_cert_part, "L=")  > 0 THEN SET L = TRIM(current_cert_part);
          WHEN INSTR(current_cert_part, "S=")  > 0 THEN SET S = TRIM(current_cert_part);
          WHEN INSTR(current_cert_part, "C=")  > 0 THEN SET C = TRIM(current_cert_part);
        END CASE;
        WHILE current_pos <> 0 DO
          #loop throuigh the string
          set  current_pos = LOCATE(delim,rest_cert_part);
          set current_cert_part = SUBSTRING(rest_cert_part,1,current_pos-1);
          set rest_cert_part = SUBSTRING(rest_cert_part from current_pos+1);

          if length(trim(current_cert_part)) = 0   then
              #ÖLast elemnt in string
              set current_cert_part = rest_cert_part;
          end if;    
          CASE 
              WHEN INSTR(current_cert_part, "CN=")  > 0 THEN SET CN = TRIM(current_cert_part);
              WHEN INSTR(current_cert_part, "OU=")  > 0 THEN SET OU = TRIM(current_cert_part);
              WHEN INSTR(current_cert_part, "O=")  > 0 THEN SET O = TRIM(current_cert_part);
              WHEN INSTR(current_cert_part, "L=")  > 0 THEN SET L = TRIM(current_cert_part);
              WHEN INSTR(current_cert_part, "S=")  > 0 THEN SET S = TRIM(current_cert_part);
              WHEN INSTR(current_cert_part, "C=")  > 0 THEN SET C = TRIM(current_cert_part);
          END CASE;
      END WHILE;
        #In sert splittet strung  row in temp table
       INSERT INTO mycertificate VALUES (  CN,OU,O,L,S,C);


  END LOOP getcertificate;
  CLOSE curcertificate;
SELECT * FROM mycertificate;
END
cALL splitcertifcate();
CN       | OU     | O             | L    | S        | C   
:------- | :----- | :------------ | :--- | :------- | :---
CN=User1 | OU=Eng | O=Company Ltd | L=D4 | S=Dublin | C=IE
CN=User2 | OU=Eng | O=Company Ltd | L=D2 | S=Dublin | C=IE
         | OU=Eng | O=Company Ltd |      |          |     

✓

db<>fiddle here

The complete stored procedure including producing the tsv file

DELIMITER //
CREATE DEFINER=`root`@`%` PROCEDURE `splitcertifcate`()
BEGIN
 DECLARE current_pos INT DEFAULT 1;
 DECLARE delim CHAR DEFAULT ',';
 DECLARE current VARCHAR(100) DEFAULT '';
 DECLARE CN VARCHAR(100) DEFAULT '';
 DECLARE OU VARCHAR(100) DEFAULT '';
 DECLARE O VARCHAR(100) DEFAULT '';
 DECLARE L VARCHAR(100) DEFAULT '';
 DECLARE S VARCHAR(100) DEFAULT '';
 DECLARE C VARCHAR(100) DEFAULT '';
 DECLARE rest_cert_part VARCHAR(100) DEFAULT '';
 DECLARE current_cert_part  VARCHAR(100) DEFAULT '';
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE certificate varchar(500) DEFAULT "";
    DEClARE curcertificate 
        CURSOR FOR 
            SELECT stationery_name FROM tablecertificate;

    -- declare NOT FOUND handler
    DECLARE CONTINUE HANDLER 
        FOR NOT FOUND SET finished = 1;
        
    #Temporary Table that holds the splittet part
     DROP TEMPORARY TABLE IF EXISTS mycertificate;
    CREATE TEMPORARY TABLE mycertificate(CN VARCHAR(100), OU VARCHAR(100), O VARCHAR(100) , L  VARCHAR(100), S VARCHAR(100), C VARCHAR(100));

    OPEN curcertificate;
    getcertificate: LOOP
        # get first row elemnt
        FETCH curcertificate INTO certificate;
        IF finished = 1 THEN 
            #Last element reached
            LEAVE getcertificate;
        END IF;
        SET CN = '';
        SET OU = '';
        SET O = '';
        SET L = '';
        SET S = '';
        SET C = '';
        SET current_pos =  LOCATE(delim,certificate);
        SET current_cert_part = SUBSTRING(certificate,1,current_pos-1);
        SET rest_cert_part = SUBSTRING(certificate from current_pos+1);
        IF length(trim(current_cert_part)) = 0   THEN
            SET current_cert_part = rest_cert_part;
        END IF;
        #Examine first element of the string
        CASE 
            WHEN INSTR(current_cert_part, "CN=")  > 0 THEN SET CN = TRIM(current_cert_part);
            WHEN INSTR(current_cert_part, "OU=")  > 0 THEN SET OU = TRIM(current_cert_part);
            WHEN INSTR(current_cert_part, "O=")  > 0 THEN SET O = TRIM(current_cert_part);
            WHEN INSTR(current_cert_part, "L=")  > 0 THEN SET L = TRIM(current_cert_part);
            WHEN INSTR(current_cert_part, "S=")  > 0 THEN SET S = TRIM(current_cert_part);
            WHEN INSTR(current_cert_part, "C=")  > 0 THEN SET C = TRIM(current_cert_part);
        END CASE;
        WHILE current_pos <> 0 DO
            #loop throuigh the string
            set  current_pos = LOCATE(delim,rest_cert_part);
            set current_cert_part = SUBSTRING(rest_cert_part,1,current_pos-1);
            set rest_cert_part = SUBSTRING(rest_cert_part from current_pos+1);

            if length(trim(current_cert_part)) = 0   then
                #ÖLast elemnt in string
                set current_cert_part = rest_cert_part;
            end if;    
            CASE 
                WHEN INSTR(current_cert_part, "CN=")  > 0 THEN SET CN = TRIM(current_cert_part);
                WHEN INSTR(current_cert_part, "OU=")  > 0 THEN SET OU = TRIM(current_cert_part);
                WHEN INSTR(current_cert_part, "O=")  > 0 THEN SET O = TRIM(current_cert_part);
                WHEN INSTR(current_cert_part, "L=")  > 0 THEN SET L = TRIM(current_cert_part);
                WHEN INSTR(current_cert_part, "S=")  > 0 THEN SET S = TRIM(current_cert_part);
                WHEN INSTR(current_cert_part, "C=")  > 0 THEN SET C = TRIM(current_cert_part);
            END CASE;
        END WHILE;
        #In sert splittet strung  row in temp table
       INSERT INTO mycertificate VALUES (  CN,OU,O,L,S,C);


    END LOOP getcertificate;
    CLOSE curcertificate;

    SELECT * FROM mycertificate INTO OUTFILE "C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/cert.tsv"
    FIELDS TERMINATED BY '\t'
    ENCLOSED BY '"'
    LINES TERMINATED BY '\n';
END//
DELIMITER ;

Result tsv file

"CN=User1"  "OU=Eng"    "O=Company Ltd" "L=D4"  "S=Dublin"  "C=IE"
"CN=User2"  "OU=Eng"    "O=Company Ltd" "L=D2"  "S=Dublin"  "C=IE"
""          "OU=Eng"    "O=Company Ltd"  ""       ""        ""

You must check the INTO OUTFILE folder, that must correspond with the entry in the my.ini/cnf file

secure-file-priv='C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/'