Design for file based with many events

cdatabase-designsqlite

I'm not a database guy but I need to design a database to store events for our application. Our client app connects to remote hardware and syncs count data. The count data are csv files split into /date/guid.cvs. The sync part is done but now I need to take the synced files and dump them into a database for a reporting system. This all gets stored on a client's computer so I need a file database for ease of use. I've been using sqlite.

Now for my question. Each row in the csv file is an event with data defined:

Timestamp, vehicle length (single), controller state (int), turn type (int), queue length (int), other ints defining the event

I was also planning to include the guid from the csv filename. The problem I see is that this single table will have many, many rows. Just on a small amount of data (couple days) I already have over 100k rows. This will need to handle potentially years worth of events.

Is there a good way to reduce the row amount or is it not going to be an issue? Am I missing some way to design the table(s) so they don't grow so large? I saw something about partitioning but I'm not sure sqlite supports that. Is there another file database that does? I'm using c# 4 as the front end for the client.

Best Answer

Looking at your table structure means that while the actual row is small (many int values) there will be millions of rows.

Yes you can use a database of your choice, MySQL, Oracle, SQL Server, PostGres for the raw data from the cameras.

You can add partitioning by day which will create a partition for each day, and you may want to consider archiving data on an annual basis into a separate table for each year to make the table sizes manageable.

I would recommend that you setup another set of summary tables for reporting which can be based on the partitions, and this can be done on a daily basis (per partition) and automated.

The actual design of the summary tables depends on the reports you need to run, but it will be necessary so that your queries do not run off the base data table which will be huger.