Yearly database archive design

archivedatabase-design

I need to implement a DB of tracking numbers. The tracking numbers table could potentially contain billions of records during its lifetime so an archive database should be implemented to improve future performance and maintenance costs.

I'm thinking of prepending the year in the tracking number, say 2019-XXX for all tracking numbers in the year 2019, then 2020-XXX for the year 2020 and so on.

The tracking numbers are then stored on their respective year table e.g., TrackingNumbers_2019, TrackingNumbers_2020, and so on.

Every new year, the previous year's table will be dropped from the "active" DB then moved to an "archive" DB. The archive DB will be rarely queried. It will be on a lower-end machine with read-only mode. It will contain all previous years data.

Now in case somebody wants to search for a specific tracking number, the application will determine which DB/table it's going to look on based from the year prefix. E.g., If the current year is 2019, then a search for 2019-XXX will lookup the active DB, but a search for 2018-XXX will lookup the archive db.

Do you think this design is sound or is there a better way?

Best Answer

I wouldn't normally prefix the tracking number with a year, instead I would have a separate field with a date in it. This way you can use BIGINT and the indexing would go a little faster than using something like VARCHAR. If having the year in it is better for human reading, then the application can always display those two numbers together. BIGINT can store more than 9 quintillion values, assuming you only use positive numbers. This could mean that depending on your equipment and performance, you might not need to archive but every few years, instead of every year.