I have a basic rhetorical/theoretical question for people experienced using both MongoDB and RDBMS. Surprised SO has no Data Munging or Data Engineering forum.
In making an aggregation pipeline in Mongo, we use $unwind to explode nested array values so each has its own document so that we may group or do other aggregation.
To me this smells just like getting the data to First Normal Form. The documents tend to look just like records in 1NF at this stage. It seems like that is the goal. I have searched and searched and not found that phrase uttered in any Mongo topics, certainly it was not said in courses I took in MongoDB University. And this would apply to other non-relational data systems as well.
Is 1NF essentially the same thing as the data form necessary for an aggregation step? Is there any data manipulation term for it?
Best Answer
For an array of primitive values the output of
$unwind
looks similar to First normal form (1NF), but pedantically the$unwind
output should be expected to be UNF (Unnormalised form) since:$unwind
do not have a unique key constraint (although arguably theincludeArrayIndex
option can be used to provide an explicit row identifier)Database normalisation is a strongly RDBMS concept, where you design a model to reduce data redundancy ("How to efficiently store data"). Standard normal forms are expressions of more rigid data relationships and constraints for tabular data.
MongoDB data model design is application-centric ("How to efficiently use data"). Since the decision on whether to normalise or denormalise data is influenced by the usage context, data model discussion generally focuses on data growth and use case patterns rather than highly normalised forms.
1NF (or an
$unwind
operation) certainly isn't a necessary step for aggregating or transforming arrays. MongoDB has other array expression operators such as$arrayToObject
,$filter
, and$map
which implement different data transformations.I'm not sure if there are any generally accepted terms, but I think an
$unwind
operation would typically be described as either "flattening" or "unnesting" arrays.