Getting multiValued data into Solr via CSV:
The solr documentation describes a "split" function in UpdateCSV. Essentially, it parses a field value using a second CSV parser. See Solr - UpdateCSV - split. The parameters look like so (adjust field name, separator, and encapsulator as necessary):
f.fieldA.split=true&f.fieldA.separator=%2C&f.fieldA.encapsulator='
Getting multiValued data from separate fields to maintain position:
Since asking this question, I've done some reading about dimensional models. It seems that what I was trying to do is poor design, because it places too many expectations on the application, to much complexity in the warehouse, or both.
When trying to preserve the relationships between two field values on a single record, it's better to store them separately as well as together. Here's a comparison of my former input to the new input:
Former CSV input:
name|licenseState|licenseType
Josh|MA,CA|123,456
Fred|MD,OH|789,123
Transformed CSV input:
name|licenseState|licenseType|licenseStateType
Josh|MA,CA|123,456|MA123,CA456
Fred|MD,OH|789,123|MD789,OH123
This way your application can use the licenseState and licenseType dimension values independently, or it can use the licenseStateType dimension values, all without requiring complicated app or warehouse logic.
Best Answer
I would say that you are pretty close to be fair - and I think (though it is a personal opinion) that 3NF is more important at an educational level that it is when it comes to the real world. I say this because it is not always logical or practical to obtain perfect 3NF, this could be due to an applications design or the performance of the queries etc etc...
That being said, the best thing I could give you to remember 3NF is a quote a friend once told me (who read it in a book to be fair!) and that is this:
Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key.
- I think the person who said it originally was Bill Kent.So on that basis, there are some minor improvements you could make; let's take your city table for example. You have an ID, Title and Post Code; well if you think about it, there are many Post Codes per City, and your design would mean that for each new Post Code you would need to add the ID and Title again - thus breaking the requirements for 3NF. For this you could have an additional table called
PostCodes
which could contain an ID, Name and City columns, you could then reference your city this way - which is better. The same goes for your Client Info table which has an address field - but it's possible that an address can be shared among many people, as can phone numbers, email addresses etc etc you get the picture!The best way to remember it is this - if there is a possibility that you have to duplicate information just to satisfy your design - then you either have a special requirement (which is possible - like I said "real world") or your design is wrong - you need to judge it on a case by case basis.
I hope that this helps you.