Getting multiValued data into Solr via CSV:
The solr documentation describes a "split" function in UpdateCSV. Essentially, it parses a field value using a second CSV parser. See Solr - UpdateCSV - split. The parameters look like so (adjust field name, separator, and encapsulator as necessary):
f.fieldA.split=true&f.fieldA.separator=%2C&f.fieldA.encapsulator='
Getting multiValued data from separate fields to maintain position:
Since asking this question, I've done some reading about dimensional models. It seems that what I was trying to do is poor design, because it places too many expectations on the application, to much complexity in the warehouse, or both.
When trying to preserve the relationships between two field values on a single record, it's better to store them separately as well as together. Here's a comparison of my former input to the new input:
Former CSV input:
name|licenseState|licenseType
Josh|MA,CA|123,456
Fred|MD,OH|789,123
Transformed CSV input:
name|licenseState|licenseType|licenseStateType
Josh|MA,CA|123,456|MA123,CA456
Fred|MD,OH|789,123|MD789,OH123
This way your application can use the licenseState and licenseType dimension values independently, or it can use the licenseStateType dimension values, all without requiring complicated app or warehouse logic.
Yes, it's in 1NF.
You can't side-step the often hard work of determining all the candidate keys by hanging a number off the end of the table and saying, "There. I've got a primary key." One natural candidate key for this table is {Name, Bought from, Date bought}. Consider using "Time bought" instead of "Date bought".
Your definition of 2NF is wrong. Instead of
Second Normal Form: A relation that is in First Normal Form and every
non-primary-key attribute is fully functionally dependent on the
primary key.
you need something more like this.
Second Normal Form: a relation that is in First Normal Form, and every non-prime attribute is fully functionally dependent on every candidate key.
The term non-prime attribute doesn't mean quite what non-primary-key attribute means.
Your definition of 3NF is wrong, and it's wrong for the same reasons as your definition for 2NF was wrong.
Instead of this
Third Normal Form: A relation that is in First and Second Normal Form
and in which no non-primary-key attribute is transitively dependent on
the primary key.
you need something closer to this.
Third Normal Form: A relation that is in Second Normal Form and in which every non-prime attribute is nontransitively dependent on every candidate key. (There isn't a really good way to express all those negative in one sentence.)
Best Answer
You may find this site helpful, too. I think it only goes up to 3NF, however it uses very good examples and clear language:
http://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/