I created a database for nutritional table information. This database have values like carbohydrates, protein and fats of a food (there is a total of 28 fields).
The problem is that there are special cases where these values can be strings:
- if the amount of a nutrient in the food is too small, the value will be
"Tr" (trace). - If the measurement does not apply to that food, the value will be "NA"
and some other cases.
I am currently using strings to store the values, and converting them to numbers in my app. But there is a better way to do this?
Here is a example of what the table looks like:
id, kilocalories, carbohydrate_grams, lipid_grams, protein_grams
'1', '123.5348925', '25.80975', 'Tr', 'NA'
I am using MySQL 8.
Sorry, i think you needed a bit more of information.
I let NULL enabled, but there is no case that the field should be null. I just enabled it because this database can receive new data later, with other conventions. (I am using the brazilian database of foods, but I will insert the american database in the future).
Any field in the table can receive one of these values:
- * means that the tests are being re-evaluated;
- Empty fields means that no lab test was requested to that field;
- NA means not applicable;
- Tr means trace;
This is a dump of the real table with two lines of data:
--
-- Table structure for table `nutritional_table`
--
DROP TABLE IF EXISTS `nutritional_table`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!50503 SET character_set_client = utf8mb4 */;
CREATE TABLE `nutritional_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`kilocalories` varchar(255) DEFAULT NULL,
`carbohydrate_grams` varchar(255) DEFAULT NULL,
`lipid_grams` varchar(255) DEFAULT NULL,
`saturados` varchar(255) DEFAULT NULL,
`monoinsaturados` varchar(255) DEFAULT NULL,
`poliinsaturados` varchar(255) DEFAULT NULL,
`protein_grams` varchar(255) DEFAULT NULL,
`dietary_fiber_grams` varchar(255) DEFAULT NULL,
`ashes_grams` varchar(255) DEFAULT NULL,
`cholesterol_milligrams` varchar(255) DEFAULT NULL,
`calcium_milligrams` varchar(255) DEFAULT NULL,
`magnesium_milligrams` varchar(255) DEFAULT NULL,
`manganese_milligrams` varchar(255) DEFAULT NULL,
`phosphorus_milligrams` varchar(255) DEFAULT NULL,
`iron_milligrams` varchar(255) DEFAULT NULL,
`sodium_milligrams` varchar(255) DEFAULT NULL,
`potassium_milligrams` varchar(255) DEFAULT NULL,
`copper_milligrams` varchar(255) DEFAULT NULL,
`zinc_milligrams` varchar(255) DEFAULT NULL,
`thiamine_milligrams` varchar(255) DEFAULT NULL,
`riboflavin_milligrams` varchar(255) DEFAULT NULL,
`pyridoxine_milligrams` varchar(255) DEFAULT NULL,
`niacin_milligrams` varchar(255) DEFAULT NULL,
`vitamin_c_milligrams` varchar(255) DEFAULT NULL,
`retinol_micrograms` varchar(255) DEFAULT NULL,
`RE_micrograms` varchar(255) DEFAULT NULL,
`RAE_micrograms` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=598 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Dumping data for table `nutritional_table`
--
LOCK TABLES `nutritional_table` WRITE;
/*!40000 ALTER TABLE `nutritional_table` DISABLE KEYS */;
INSERT INTO `nutritional_table` VALUES
(1,'123.5348925','25.80975','1.000333333','0.3','0.4','0.3','2.58825','2.749333333','0.463','NA','5.204','58.702','0.627333333','105.853','0.262','1.244666667','75.15166667','0.020333333','0.682666667','0.08','Tr','0.08','Tr','','NA','',''),
(2,'359.678002','77.45071413','1.864833333','0.3','0.5','0.4','7.32328587','4.819166667','1.181333333','NA','7.818','109.71','2.993333333','250.865','0.948333333','1.645666667','173.34','0.074833333','1.395166667','0.261666667','Tr','0.175','4.183333333','','NA','','');
/*!40000 ALTER TABLE `nutritional_table` ENABLE KEYS */;
UNLOCK TABLES;
saturados, monoinsaturados and poliinsaturados are the same as the other fields. I had to translate this to english (they are written in portuguese) and add the suffix _milligrams to them. I just let to do fix this later.
About the abbreviations: I used the full names because I think it is more easily readable. I get a little confused with the abbreviations. But in the app I am showing the data using the abbreviations.
Best Answer
TL;DR
Store the numbers as numbers! For 'Trace', 'N/A' &c. have negative integers as codes and deal with the logic that way -
CASE
and company! OR, useGENERATED
fields and let the server do all the conversions fromVARCHAR
toINT
- never have to worry about a mistake again! Check out the use of aCASE
statement in aGENERATED
column definition at the bottom of this answer!I would do this in one of two ways:
First way (the better of the two - by far!):
You have:
The gramme/gram is an SI unit: SI unit symbol: g
Anything with gramme/gram should be a pure number. The field/column containing it should be suffixed with
_g
.Just as a matter of interest, should that not be
... (255) NOT NULL DEFAULT 0
? It beats theCOALESCE
function - I always to try and avoidNULL
s if possible!Are these all numbers or percentages or what? If pure numbers or %, then store them as pure numbers, or with
_pc
as a percent suffix.The milligramme is a recognised unit, albeit not official SI. The suffix should be
_mg
!All microgram units should be suffixed with
_mcg
.Why you go to the trouble of converting the units back and forth as strings is beyond me!
You should also maintain
comment
s on your tables so that poor devs don't get confused and have no excuse for f**ing up! Make it part of SOP that when a field is added, an explanatory comment is added. Also, the tables themselves should have comments, possibly with references about where to find further information (example - of use later on!).Then, one can do stuff like this (table comments):
Result:
Field comments:
Result:
Note the explanation of what units are being used, in case anybody is so slow that they don't realise that the
_gr
suffix is grammes!For
Trace
, 'N/A' &c. - have a code table, say:So, I would imagine that most of the time, you're not too interested in traces or not-applicables? This might introduce a level of complexity, but I still think that it's a win-win compared to your current scenario. Take a look at the bottom of the fiddle for how to
CAST
these - very hacky - but then it is MySQL! :-)OK, so far, so good - if you want to overhaul your system, which I advise you to at least set as a long term goal, but better to start ASAP!
But, what you can do - if, say, for legacy reasons you have to keep your table the way it is while you change to a reasonable table structure is this!
The second way (temporary until code refactoring):
This would be to use
GENERATED
(orCOMPUTED
or sometimes calledVIRTUAL
) columns (althoughVIRTUAL
is a misnomer as they may - in the case of MySQL such fields can be eitherVIRTUAL
orSTORED
(akaPERSISTENT
)). As an aside, this is one of the very few ways in which MySQL is actually superior to PostgreSQL which can only haveSTORED
(i.e. materialised)GENERATED
columns for the moment, though I would always recommend PostgreSQL in preference to MySQL (personal opinion!).What I would recommend then (as a temporary solution) is something like this (fiddle here - all the SQL above is also included in the fiddle):
From the MySQL documentation, the syntax is:
To implement this, I had to
CAST
theVARCHAR(255)
field to anINTEGER
. I EVENTUALLY found the following solutions (how do you do it?):Result (1st and last SQL statements only):
The
SQRT()
function proves that it's acting as a proper number!and
So, now I'm ready to create my table with generated fields:
Poplulate it:
Then check (always check!):
Result:
And a double check of our new numeric field (
ROUND()
andSQRT()
):Result:
In addition to greatly simplifying your life, this approach should also be more performant. I would suggest that you perhaps try and do some tests on the different tables and see what you come up with.
For example, if you are asked to get the average positive ion content from Group 1 of the Periodic Table, you just have to add the Na and K content - with your current approach, you have get your raw data, convert it to a number, add it and obtain your result. A lot of work for nothing.
EDIT: - in response to comments beginning here.
saturados, monoinsaturados and poliinsaturados
- Saturates (or saturated_fat), monounsaturates and polyunsaturates.About the abbreviations you mentioned, like _g instead of grams, i used the full names because i think it is more easily readable. I get a little confused with the abbreviations.
This is why I placed stress on using table and field comments in your particular case.
I would like to upvote, but i cant do this yet
Your vote will be recorded and applied when you have sufficient reputation - I'll help - your question was interesting and provided a challenge, so +1!
I do not "feel so good" in using a value that should be the amount of that nutrient as a flag if it is negative
Neither do I "feel so good" about this particular solution, but it's probably the lesser of two evils - the other way of tackling this problem is to use the EAV anti-pattern, which opens a whole other can of worms! I've also written about it. EAV to a dev/progammer, is like heroin to a drug addict - things start great, then they spiral out of control until the situation is a complete mess. I always put this image into my mind if ever I'm tempted to use it, even for a "small" (they always grow) configuration table.
About the GENERATED fields, this way i will just lose the information of the special cases not? So it defeats the purpose of using strings at the first place. But, as you said, this is to be used as a temporary solution.
Maybe not so temporary? I didn't know this was possible, but you can do stuff like this (see fiddle - really thrilled I got this to work!!! - definitely a +1 for the question):
Then, perform some inserts:
Check:
Result:
So, with the
CASE
expression and theCHECK
constraint (introduced in MySQL 8.0.16 - a mere 30 years after every other major database server had them!), you have reasonably fine-grained control over what goes into that field!The only fly in the ointment is that your devs/programmers will have to remember to put the correct field values into the
CASE
statement - but, I think this is easier than your current situation (IMNSHO) - or using EAV! However, the upside is that it ONLY has to be done correctly ONCE - in the table definition - get that right and you never have to worry about inconsistent data again!Finally, the icing on the cake:
Result:
So, only those strings which are meaningful will be allowed to be
INSERT
ed into your field! In theory, if SQL was allowed inCHECK
constraints, you would have even more control over your data. It's a pity that MySQL (and nobody else apart from Firebird) allows them, but they're probably on the way - but, hey, it only took MySQL 30 years to implementCHECK
constraints so who knows?...