Decorating an existing database table with information that no one else cares about

database-design

We have a table that many processes read from. Some of them want to keep track of the fact they have processed this data, whereas others simply rely on fetching the data from the table based on user actions.

I think ideally, the consumers would interface with this data indirectly: probably via a web service, and then they could just keep track of what resources (URIs) they have discovered and processed, all internally to themselves.

The other options seem to be either adding a column to the existing table, or adding a new table – either way, one for each consumer. I think adding columns is easier and probably performs the best, but adding a new table keeps the clutter away from the original table.

Aside from those points, are there any others that I should be aware of?

Edit:

To elaborate..

I have a table foo that is populated by an external process, and read:

  • By a web-service that simply exposes data depending on criteria (show me the latest… show me the one for this particular date, etc)
  • By a post-processor, that looks for foo records that it has not seen before, processes them (which writes data somewhere else, unrelated to the web service) and then wants to mark them as processed.

In order to mark them as processed, we can either have a column on the foo table (processed = 1), or we can have a foo_processed table (that just has the ids of the foo that have been processed, and use an outer join to find unprocessed foo records)

At this time there is only one post-processor, but I can't guarantee that.

Best Answer

Your instincts about indirect access are correct. Adding columns for clients to the table flies in the face of this and is a generally bad idea. From a mechanical point of view, you are taking on lots of DDL changes to a key table -- how much do you want to touch it? Will all the tracking map to something you can keep in a single row? Will someone stuffing data into this decide to dump a big nasty serialized XML object in there destroying performance? What happens when two of the apps want a field called "report_status"?

The correct course of action is to push these folks to separate tables or other data stores for their tracking. Then they can track what they need to track without interfering with the core data or with anyone else.