The first thing that comes to mind is a particular RDBMS that's familiar to me. I recognize, however, that it may not be the best for this application.
So, my advice is to go with a database that are familiar to you. If you're familiar with Redis or MongoDB, then go with one of those. If you're more familiar with SQLite, then chose that.
On a database of this size, it's all going to be pretty quick. Even databases that are more disk-heavy will use some sort of caching so that disk speed isn't too much of a concern.
Snapshotting vs. Running Sums
Any approach to point-in-time reporting is effectively equivalent to snapshotting or calculating a running sum. Your options are variations on:
Calculate a running sum from the beginning of time, or some periodicaly checkpointed position. This requires you to record changes rather than totals.
Calculate periodic snapshots and do your historical statiscs based on the snapshots.
Hybrid running-sum with periodic snapshots to baseline positions and reduce the data set over which the running sum needs to be calculated.
This is essentially a speed for space tradeoff, and temporal databases are really just abstractions on a data structure that is semantically equivalent to one of the schemes described above.
If you want speed, calculate and persist periodic snapshots of your position. This will use more space, but if your DBMS platform supports partitioning then the snapshots can be managed and disposed quite efficiently. The snapshot date forms the partition key, so any queries that specify dates will give the optimiser enough information to ignore partitions that are not relevant to the query.
Frequent snapshots will use lots of disk space, so you trade off granularity against disk space usage.
If you want to save space then store changes and calculate running sums for your as-at positions. You can also use periodic (though less frequent) snapshots in combination with the deltas for a period to optimise the query by starting with balance figures from the appropriate snapshot and adding the deltas to your as-at date. This technique is commonly used with accounting systems such as Oracle Financials.
Hybrid approaches
Essentially you have a continuum from a pure running sum to a pure snapshot. Snapshotting is faster to query, and can be used to speed up running sum calculations by allowing you to start the running sum calculation at the snapshot date. A hybrid approach is more complex as you have to implement both mechanisms, but it does allow you to tune your speed/disk usage tradeoff to whatever is most appropriate to your application.
Some thoughts on what might be applicable when
It's far easier to retrofit snapshots to a base transactional model than to go the other way around. However, some models - things where you have time-dependent state - only make sense when viewed as a snapshot that shows the state at a given point in time. An example of this is aged debt reporting where you want to see the invoices with money that's been owing for more than (for example) 180 days. If you want to retain a reconcilable record of aged debt positions then you may wish to snapshot this periodically.
In the first instance a transactional fact table with the base movements (deltas) gives you the raw data from which you can calculate your snapshotted positions. It's best to have that available if at all possible, unless recording the data in that format is made infeasible by some limitation of the upstream data source.
I've done an aged debt reporting system and a claims reporting system based on snapshots before. The aged debt report was an operational report run at the beginning of each week, and the requirement for the claims snapshot table was for a monthly report where the business wished to see counts and outstanding values of claims that were open at the time.
Some things (e.g. claim reserving) are done on a periodic basis, so (for example) ultimates and IBNR1 are only calculated on a monthly or quarterly basis and it only makes sense to report those on a periodic basis. In this case, snapshotting the figures makes sense due to the nature of the underlying process that generates the data.
1Ultimates are insurance-speak for forecasts of the total claims expected on policies incepting in a given year. IBNR (Incurred but not Reported) is an estimate of the figures on claims that have not occurred or been reported yet.
Best Answer
One of my favorite approaches to problems like these in the context of data warehousing usually takes the form of an agile design practice, where the granularity of a system does not need to always be "rolled up" from the lowest level right away, but can be, as stated in the title of the topic "drilled down" into, as the finer granularity details become available.
Assuming you have some table
Expense
with columnsExpense_PK
,CompanyName
andTotalExpenses
, I would simply create a new table,ExpenseDetail
, containing columnsExpenseDetail_PK
,Expense_FK
,Taxes
andOtherExpenses
, adding a detail record when one was available.The nuance of doing this is to be stalwart in maintaining the detail record properly. Should the nature of
OtherExpenses
be revealed to be solely comprised ofAdjustments
,Dues
andAssessments
, it is often far better to extend the detail record with the additional atomic columns for each of the now determinant fees than to add anotherOtherExpensesDetail
table to the mix. On the other hand, in Canada anyway, theTaxes
column would pose a more than reasonable opportunity for aTaxesDetail
table, definedTaxesDetail_PK
,ExpenseDetail_FK
,GST
,PST
, so that Company A could provide additional tax breakdowns in the form of Federal and Provincial taxes, where as theoretical Company C could simply be allocated the aggregatedTaxes
value in the detail record if that was all that was available or were operating in a province that only applied a singular harmonized sales tax (HST
).That said, I feel there are many ways of going about problems like these, but this is at least one example of something that is quite versatile and works quite well for what it is.