You definitely want to keep all the data you collect, as it can be extremely useful for longer term detailed trending, even if it can take a while to trawl through it all. Also, when summarising up the data to less granular tables, do not AVERAGE()
the data you have collected - always SUM()
and COUNT()
the rows that you are summarising - this allows you to summarise that data higher if needed, and you can calculate the averages at whatever level you desire.
Remember...
You cannot average averages...
In terms of the data structures, I would take the following approach;
detailed_data
- a table to hold the most granular level of data you have
minute_data
- data summarised at the minute level
hour_data
- data summarised at the hour level
day_data
- etc
week_data
- etc
month_data
- etc
What I would do in your situation is very dependant on how you receive your stat data, but I can see there being a couple of simple options for this.
Option 1 - Create a Stored Procedure to Store the data
This would be the preferred option, as you can create separate stored procedures for adding, updating and deleting data from the main table, and the stored procedure can then handle the updating and summarisation of all the other data tables.
Option 2 - Create triggers on the data tables
You can use triggers so that when data is added to the detailed_data
table, it automatically summarises itself into the minute_data
table, which then triggers an update to the hour_data
table, and so on up the chain. The drawback with this is when deleting or updating the stat data, you may have to create some quite clever triggers - but it is doable.
What to Analyse
When you have the data summarised like this, you can analyse it at whatever level you want, and you can join the date / time info into dimension tables to get some nicer level of analysis and filtering - see my answer to this post for more info https://stackoverflow.com/questions/3249877/mysql-using-the-dates-in-a-between-condition-for-the-results
I understand that you want to go with single database (as it is good from management & maintenance point of view), but maybe it's too much integration.
I am assuming that:
- you will have separate application for each industry,
- they have not very much in common (they really cover different business aspects)
I think that one of acceptable solutions in this case would be:
- a separate database for each industry
- in every database, single shared public schema for shared data, including common dictionaries, all non-customer-dependent data.
- one separate database user account + private schema for each customer. You might use postgres feature of table inheritance to conform to some public table.
Issues that Simon puts in his comment are really important. So you will have to enforce a very strict policy regarding GRANTs in your database.
In PostgreSQL, it is possible to achieve (as well in many other RDBMS). The key to achieving multi-tenant solution would be intelligent usage of schemas, roles, search_path
setting. See http://www.postgresql.org/docs/current/static/ddl-schemas.html. Actually what I propose will in a way emulate what Oracle does. If you need help on details please ask.
Best Answer
How large do you expect the data to grow? No harm in keeping older data in the same table. You could plan to use partitioning and implement date range partitions (per year). Using WHERE statements is completely natural in SQL so don't consider this as "messy" at all.