I would say that if your users are going to need to query the Archive data, then using the bit
flag or soft delete
is easier. If the users don't need the data any longer, then I would go with the archive tables.
Based on your description above, I would suggest going with the Soft Delete
version. I can tell you from experience in one of our systems, we went with an archive schema to move older data to and it lead to nothing but issues because the users needed access to the data. So it lead to using UNION ALL
on every query we had to run.
As a result of the issues, we stopped that route and moved to the soft delete, which is much easier.
We added a bit
flag to all of the tables it was needed and then we just included this in the WHERE
clause when querying the data.
A suggestion would be to make sure that this field has a default value when you INSERT
data. If you are using IsArchived
then the default value on the column would be false since you do not want it archived immediately.
There are essentially three options (with the possibility of complexity via combination of the three options):
- Keep each table separate
- Bring all of the tables together as one major table
- Use an associative relationship: a base table with the common columns, and individual tables for the non-common columns
Here are some of the factors to consider:
- Do these really represent the same entity? If so, what is that entity? Based on your examples (Quote, Order, Invoice, and Credit Memo), these are separate entities to me. They are 'related' but are not the same or subtypes of a master entity (such as having an Employee table and ExemptEmployee and NonExemptEmployee subtables which have fundamentally different attributes). This tilts in favor of keeping tables separate.
- Do you need to query against Quote UNION Order UNION Invoice? Note that this is different from joining Quotes to Orders (to get, say, a percentage of successful sales). If you do need to do those types of queries often, that argues in favor of a single table. I could see doing this if you're looking at "the life of an order." On the other hand, that's still a simple query even when split out into several tables, so it's a minor consideration.
- Are there columns which are not common to all of the types? You mentioned that there are several special fields per type. This means you probably do not want to go with option #2 in any case, because option #3 re-introduces relatively simple integrity constraints. Suppose that, in your big table, you have three columns which are required if you are doing an Order, but should be NULL otherwise. Doing this is trivial with associative tables, but requires check constraints with the One Big Table approach. And if column X for an Order needs to be numeric but for an Invoice needs to be text, that just makes validation way more difficult.
- Do you ever change types? Not just copying Quote data into an Order, but actually changing the Quote itself to an Order. If so, that's an argument in favor of one big table. But I don't think that's a realistic business process.
- How many orders do you normally get? The more orders you get, the more you want to split out those tables. A scan against X pages is still better than a scan against 4X pages, and separate tables increases the likelihood that you'll be able to do seeks instead of scans because of finer-grained control of indexes.
Based on these considerations, I would personally keep the tables separate. Even though they may have similar attributes, they are fundamentally different entities.
What I would try to do, however, is look at those sets of common attributes and see if they really belong on the Quote/Order/Invoice/Credit Memo, or if normalizing those out into their own tables would be a better idea. For example, if you have customer first name, customer last name, customer street 1, etc., splitting that out into a customer key (or customer and address keys, depending upon circumstances) could cut down on the total number of columns in each of the four major entities, reducing repetition and gaining the other benefits of normalization.
It's possible that you've already normalized everything out to the fullest extent and that each of those entities does, in fact, legitimately have 50 attributes which are in common. In that case, I'd still keep separate tables because you're talking about fundamentally different things which happen to relate to one another.
Best Answer
There's a couple of ways I can think of to do it:
You could have a "Deleted" indicator, which is set to "Y" when a record is deleted. Very simple, but it leaves deleted and active data in the same tables, which could cause performance problems if there's a lot of activity. Restoring a record is as simple as changing the indicator.
You could have a table that mirrors the structure of the one you want to delete from, and when you delete a record from the main table, insert that record into the "deleted_data" table. This allows you to move deleted data into separate table to give the main table better potential performance, but searching for data that is in either table could make things more complicated, and keeping the data structures in sync (when they change) is also a bit more work that has to be done. In this case, restoring the data would involve removing the record from the "deleted" table and inserting it back into the main table.
In both cases, you may want to find any child records and mark them as deleted as well. I suppose you could do this with triggers or with code.