Mysql – Complex query with multiple normalized fields

insertjoin;MySQLnormalization

We have a fairly simple table structure, but with a LOT of fields per table (talking 40+). This data is initially produced in plain-text, user-readable tables, but then it is translated into higher-performance, easier to query tables before being installed for use in production.

What we do is, wherever possible and reasonable we translate certain fields into enumerated values, and keep track of the enumerations in a MasterEnum table. There are usually 20-25 enumerated fields out of 40 or so.

Sample table structure:

Plain text version:

    |  PartNumber   |  Manufacturer  |  SomeData  |  SomeMoreData  |  SomeTextData ...
    ----------------------------------------------------------------------------------
    |  1x9kdah      |  GizmoCorp     | ThisIsData |  OtherData     |  ThisStaysText ...
    |  8xcjkzh      |  GadgetInc     | MoreData   |  OtherData2    |  ThisTooStaysText ...

Target table sample structure:

    |  PartNumber  |  Manufacturer  |  SomeData   | SomeMoreData  |  SomeTextData ...
    -------------------------------------------------------------------------------------
    |  1x9kdah     |       1        |    1        |      1        |  ThisStaysText ...
    |  8xcjkzh     |       2        |    2        |      2        |  ThisTooStaysText ...

Master Enumeration Table Structure

    |  FieldName     |  InputText  |  ValueCode |
    ---------------------------------------------
    |  Manufacturer  |  GizmoCorp  |  1         |
    |  Manufacturer  |  GadgetInc  |  2         |
    |  SomeData      |  ThisIsData |  1         |
    |  SomeData      |  MoreData   |  2         |
    |  SomeMoreData  |  OtherData  |  1         |
    |  SomeMoreData  |  OtherData2 |  2         |

We have a means of doing this translation that works and works well; however it's a little on the slow side since all the processing is done in Java via Spring/Hibernate. My question is:

Is there a way to write a single query that would accomplish all the above translations? (Note that we have an excellent way of keeping track of our field definitions programmaticly, so generating complex SQL queries on the fly is not an issue). If it is not possible to do it in a single query, how would I structure queries to iterate over the individual fields and make sure that as the translations happen the data is inserted into the new table remains associated with the correct rows?

Note that it is safe to assume the target table is always empty at the beginning of the process.

Best Answer

As others have pointed out, this is a Really Bad Idea. Still, if you insist, the SQL is not hugely complicated:

CREATE TABLE RawData
(
PartNumber  VARCHAR(30) NOT NULL PRIMARY KEY,
Manufacturer    VARCHAR(30) NOT NULL,
Data1   VARCHAR(30),
Data2   VARCHAR(30),
Data3   VARCHAR(30)
)

CREATE TABLE Translations
(
FieldName   VARCHAR(30) NOT NULL,
Value   VARCHAR(30) NOT NULL,
PRIMARY KEY (FieldName, Value),
ID  INT NOT NULL
UNIQUE (FieldName, ID)
)

CREATE TABLE CleanData
(
PartNumber  VARCHAR(30) NOT NULL PRIMARY KEY,
Manufacturer    VARCHAR(30) NOT NULL,
Data1   VARCHAR(30),
Data2   VARCHAR(30),
Data3   VARCHAR(30)
)

INSERT INTO CleanData (PartNumber, Manufacturer, Data1, Data2, Data3)
    SELECT
        RD.PartNumber,
        TMfr.ID AS Manufacturer,
        TDt1.ID AS Data1,
        TDt2.ID AS Data2,
        TDt3.ID AS Data3
    FROM
        RawData AS RD
        LEFT JOIN Translations AS TMfr ON RD.Manufacturer = TMfr.Value AND TMfr.FieldName = 'Manufacturer'
        LEFT JOIN Translations AS TDt1 ON RD.Data1        = TDt1.Value AND TDt1.FieldName = 'Data1'
        LEFT JOIN Translations AS TDt2 ON RD.Data2        = TDt2.Value AND TDt2.FieldName = 'Data2'
        LEFT JOIN Translations AS TDt3 ON RD.Data3        = TDt3.Value AND TDt3.FieldName = 'Data3'

Extend to the complete set of fields. May Codd have mercy on your soul.