If all apples are green and all bananas are yellow a Fruit
table (ID int, Name varchar(50), colour varchar(50))
would be appropriate, with your data table having a foreign key to it.
If you have yellow apples and orange bananas (yea for genetic engineering!), but only certain combinations are permitted you will need FruitRainbow(ID int, FruitID, ColourID)
with the latter two as FKs to your option 2 tables above and your data table having a FK to FruitRainbow.
If any fruit can occur in any colour, and you don't want to limit those combinations in advance, your option 2 is fine.
If your query is really about resource optimisation rather than relational integrity then you'll have to decide what you want to gain and what you're willing to trade to get it. By using integer FKs instead of natural name you get a smaller disk footprint at the cost of runtime load. There are no free lunches. Pick your problem and solve it in the knowledge of the compromises it will entail.
The aggregate table can probably stay in the same DB. You can however create it on a separate filegroup and disk.
Full update of a new aggregate table
One way of doing it using pure SQL and your sample:
WITH list ([Type], [FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM data
UNION ALL
SELECT 1, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM data
) l([Type], [FeatureId], [Measure], [Value], [Timestamp])
)
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = CASE WHEN l1.Type = 0 THEN l1.Value ELSE l2.Value END
, [Timestamp] = CASE WHEN l1.Type = 0 THEN l1.Timestamp ELSE l2.Timestamp END
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
;
With this really small sample, I am not sure it covers all your needs. It may help to add a bigger sample with more data.
It does 4 table scans. Since you have 250k rows, it may not perform so well. It will probablibly better to run it in batch of X consecutive FeatureIds.
This could be done using a job and a SSIS package with either a single update or batch update. You would have to truncate the aggregate table first.
Trigger on each new row in main table
For new rows, using a trigger, this query could be used:
WITH new([FeatureId], [MeasureFrom], [MeasureTo], [Value], [Timestamp]) as (
-- SELECT 1, 1, 20, 2, '2015-01-01'
-- SELECT 1, 5, 15, 3, '2015-01-02'
SELECT 1, 9, 10, 8, '2015-01-03'
), gap([FeatureId], mn, mx) as (
SELECT n.[FeatureId], mn.mn, mx.mx
FROM new n
CROSS APPLY (SELECT mn = MAX([MeasureFrom]) FROM data3 WHERE [FeatureId] = n.[FeatureId] AND [MeasureFrom] < n.MeasureFrom) mn
CROSS APPLY (SELECT mx = MIN([MeasureTo]) FROM data3 WHERE [FeatureId] = n.[FeatureId] AND [MeasureTo] > n.MeasureTo) mx
), list (x,[FeatureId], [Measure], [Value], [Timestamp], [ID]) as (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [FeatureId] ORDER BY [Measure])
FROM (
SELECT 0, d.[FeatureId], d.[MeasureFrom], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND (d.MeasureFrom >= g.mn AND d.MeasureFrom < g.mx)
UNION ALL
SELECT 1, d.[FeatureId], d.[MeasureTo], d.[Value], d.[Timestamp]
FROM data3 d
INNER JOIN gap g ON d.[FeatureId] = g.[FeatureId] AND d.MeasureTo = g.mx
UNION ALL
SELECT 2, [FeatureId], [MeasureFrom], [Value], [Timestamp] FROM new
UNION ALL
SELECT 3, [FeatureId], [MeasureTo], [Value], [Timestamp] FROM new
) l(x, [FeatureId], [Measure], [Value], [Timestamp])
)
MERGE data3 AS target
USING (
SELECT l1.FeatureId, [MeasureFrom] = l1.Measure, [MeasureTo] = l2.Measure
, [Value] = COALESCE(d.[Value], l1.[Value])
, [Timestamp] = COALESCE(d.[Timestamp], l1.[Timestamp])
FROM list l1
INNER JOIN list l2 ON l1.FeatureId = l2.FeatureId AND l1.ID+1 = l2.ID
LEFT JOIN data3 d ON d.MeasureFrom = l1.Measure OR d.MeasureTo = l2.Measure
) as source(FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
ON (target.FeatureId = source.FeatureId AND target.[MeasureFrom] = source.[MeasureFrom])
WHEN MATCHED THEN
UPDATE SET target.[MeasureTo] = source.[MeasureTo]
WHEN NOT MATCHED BY target THEN
INSERT (FeatureId, [MeasureFrom], [MeasureTo], [Value], [Timestamp])
VALUES (source.FeatureId, source.[MeasureFrom], source.[MeasureTo], source.[Value], source.[Timestamp])
;
You must replace the new CTE by the values of the new inserted row in the trigger and use it to merge with the aggregate table.
Scheduled update of newly added rows
If you can find a way to get a list of all newly added rows to the main table since the last update of the aggregate table, you could schedule a job every x minutes or hours and only update what is necessary based on what has been recently added.
The trigger query will work as well with scheduled updates.
This can be run as a schedueled job and within a SSIS package.
Best Answer
Compilation of comments
@Sole DBA Guy pointed out:
@MDCCL mentioned:
Then, I replied to his comment with the following question:
To which @MDCCL responded:
Finally, @srutzky added: