Documenting a collection of databases and their relationships with each other

database-designdocumentation

As part of a project I'm currently involved with we need to identify where various data points are held within existing systems, with a view to integrating them into our central integration engine. We're part of an NHS Trust, so there are many databases, reporting databases, etc.

There seem to be many tools to design & document a single database, but what I'm after is something a level higher, which can record the relationships between data elements in different databases. I can't find anything on Google, but I wonder whether I'm just using the wrong terms.

At the moment we're solving the problem by recording the database, table, attribute & a sample query for each data item. A system which could at least group the different data elements by table, database, etc. would be useful.

It seems this would be useful to any organisation with many disparate databases, so want to check it doesn't already exist (or that there's not a better solution for the problem).

Best Answer

Firstly, let me say from the outset that you have my deepest sympathies! :-) Maintaining a true-to-life up-to-date schema of even a single database is bad enough, but having to maintain a single point of truth (SPOT) for a multi-database disparate data-source environment is going to be nothing short of hell on earth!

You _must_ establish a SPOT for your systems - one path in, one path out. If you don't have serious senior management buy-in for this project, it is doomed from the start!

Take a look here and follow the links to K. Scott Allen's posts. Check out Agiledata.org and Allen's site ambysoft.com. Establish the relationships - there are schema generation tools for every database under the sun, but given that I imagine you have a large hetergeneous environment, you probably have everything from Oracle Enterprise servers down to spreadsheets on individual's machines.

Bear in mind that there's only one thing worse than no documentation and that's inaccurate documentation (think prominent, but defective, fire extinguisher). Furthermore, if your central system is seen to sub-optimal, all that will happen is that workers on the ground it will be bypassed and become a shelf-ware irrelevancy.

The approach I would take in this situation is to establish a framework whereby there is [1 person | a small group] responsible for a given system and it is up to them to maintain their part of the schema documentation and ensure that they have the tools to do so. Yours should be more of an organisational/oversight role. Make it SOP that your group be informed of any schema changes (immediately) so that their (and by extension the central) meta-data repository is kept up to date. IMHO, you face an uphill battle and I wish you the best.

As for tools for documenting a given system, leave that up to the experts in those systems. As for a tool which consolidates "meta-meta"-data, I'm not aware one - apart from graft that is.