Getting multiValued data into Solr via CSV:
The solr documentation describes a "split" function in UpdateCSV. Essentially, it parses a field value using a second CSV parser. See Solr - UpdateCSV - split. The parameters look like so (adjust field name, separator, and encapsulator as necessary):
f.fieldA.split=true&f.fieldA.separator=%2C&f.fieldA.encapsulator='
Getting multiValued data from separate fields to maintain position:
Since asking this question, I've done some reading about dimensional models. It seems that what I was trying to do is poor design, because it places too many expectations on the application, to much complexity in the warehouse, or both.
When trying to preserve the relationships between two field values on a single record, it's better to store them separately as well as together. Here's a comparison of my former input to the new input:
Former CSV input:
name|licenseState|licenseType
Josh|MA,CA|123,456
Fred|MD,OH|789,123
Transformed CSV input:
name|licenseState|licenseType|licenseStateType
Josh|MA,CA|123,456|MA123,CA456
Fred|MD,OH|789,123|MD789,OH123
This way your application can use the licenseState and licenseType dimension values independently, or it can use the licenseStateType dimension values, all without requiring complicated app or warehouse logic.
You can definitely keep all your dimensions and measures in one fact table and not use any dimension tables. Make sure your OLAP tool supports this though.
Normalizing out your dimensions into other tables is done mostly to minimize the size of the fact table, which can get large fast.
With no dimension tables you're looking at about 336 MB per year (not counting indexes), which isn't so bad.
With dimension tables, you're looking at about 34 MB per year, plus a couple dozen MB for storing dimension details. Indexes will be smaller too.
You'll want to expand your date column into something more analyzable (year, month, quarter, etc), which will add to the size.
You'll want to index all fields. Drop indexes before insert, add them after.
You can use a tool like Pentaho Aggregation Designer to find useful aggregates and generate them for you.
Best Answer
You can create a table ProductPrice:
ProductProductSet
beging the many-to-many relationship betweenProduct
andProductSet
.These two foreign keys would ensure that the price refers to a store that is intended to have such product. Then you can create a query to find out which prices are missing: