Should I create a column for specific behavior

database-design

I would like to expose a problem I often have when I work with database and I never knows what to do.

I need to load data from a file into a table (table1) line by line with the help of a software. The file can be a csv or something similar. During the loading I also check the validity each line and each value in this line. If something is wrong I put a message another table (table2) with a reference to my line. A line can have more than one error.

So in my table1 I can have valid and invalid line. To check if a line is valid or not I need to check if there is a message, a line, in my table2. I hope this is clear.

My question now. Thinking about database design should I create a column valid (true/false) in my table one to avoid the have to check table2 each time I want to get value from table one.

Example:

table1:
reference | version | var1 | var2 | var3
----------------------------------------
 001      | 1       | x    | y    | z
 001      | 1       | a    | b    | c
 001      | 2       | null | y    | z
 001      | 2       | null | t    | u
 001      | 3       | x    | y    | z
 001      | 3       | a    | b    | c
 001      | 4       | null | y    | null
 001      | 4       | null | t    | null


table2:
reference | version | message
------------------------------------------
 001      | 2       | var1 cannot be null
 001      | 4       | var1 cannot be null
 001      | 4       | var3 cannot be null

In table1 I have an item 001 with 4 versions. Version 4 is not valid. This information of validity come from table2.

I always need to be able to request the latest valid reference. Get the last valid reference. To do so I need the check into 2 tables.

Check last line of table1 then compare it with table2. If invalid ignore last invalid line then repeat the request. This seems so heavy…

Best Answer

If I'm understanding your question correctly, what you might consider doing is having 3 tables. Table1 would be your "Line" table. You'd have a table, maybe called "Errors," containing the possible types of errors that could exist. Then you'd have a table joining them together, maybe called "LineErrors," which would reference the line, and the type of error it had.

In this case, you could also include a column in your Line table which held a bit value indicating if that line had at least 1 error, which you could determine by looking that line number's primary key up in the LineError table.

Hopefully this makes sense, and that I'm understanding your problem sufficiently.