How to organise and understand a terribly modelled database

database-designdocumentationoracle

I have recently joined a company where there are hundreds of schemas and thousands of tables. All the schemas we have are from other vendors who have allowed us access to their database. The entire database is poorly documented (little to no documentation). Vendors also don't care for documentation, hence they don't provide us with any documentation.

The other employees themselves find majority of their time figuring out what the data represents and if the data is current. There are numerous front end tools (not built by our company) which allow us to pull data (which is correct and relevant). However, then the issue becomes that we don't know the schemas, tables or columns the front end is using to pull the data.

I cannot create new tables as the tables are owned by 3rd parties who have allowed us access to their database. How can I go about documentation and how do I figure out what schemas, tables and columns front end tools are using to pull data.

Tables don't even have foreign keys to figure out how they are linking to other tables. For example in tableA there may be a column called car_id, but in tableB there is a column called cid (which is the car_id). They both link together.

Best Answer

To add on to the first two comments on your post, yes unfortunately this is the case in a lot of Vendor systems. Reverse engineering, and normalizing their less than optimal schemas has been a fairly common task I've dealt with for multiple Vendor systems at multiple different companies I've worked for. There's a couple things I recommend to make it slightly easier on yourself either now or at least in the future:

Documenting your progression is important. As you find out the relations between entities, I'd recommend building out a database diagram for them, and then expanding on that diagram as you learn about other relevant entity relations. Of course in a large scale system, a single database diagram can become unwieldly, so you might want to break it up into multiple diagrams for each subject matter in the system. E.g. ERP systems typically have a multitude of modules that span different subject matters between Finance, Sales, Production, etc and in such a case I'd recommend at least a separate database diagram for each module.
Familiarize yourself with a schema search tool, such as the Find DB Object feature of the Oracle SQL Developer or the search features of PL SQL Developer as mentioned in this StackOverflow answer. (Microsoft SQL Server has the luxury of RedGate's SQL Search tool, but unfortunately I don't think RedGate offers an equivalent for Oracle in their tool belt.) This will be helpful in understanding all the places an entity is referenced and how it's used.
As much of a pain as it can be working with the Vendor and their documentation, if you have a technical contact who can at least answer the question if their application itself (or if they have a tool or database dictionary) that exposes the entity names of the fields behind the UI then that could make your job with reverse engineering their system a lot easier. Surprisingly this is fairly common practice that I've seen among many Vendor systems I've encountered. In some cases it was a dedicated shortcut key while focused on the field in the UI, sometimes it was a matter of right clicking the field and it was an option in the context menu, other times it was a separate tool bundled with their software (which was also used for customizations), and I've even seen it purely in the database in a dedicated schema / database dictionary. Also you may need to be a super user of sorts to be able to access that information. Either way, the Vendor should be able to answer that question for you.

Best of luck!

Related Solutions

ER Diagram translation to tables

Enforcing option (c), using a type attribute:

CREATE TABLE Computer
( computerID INT NOT NULL
, computerType CHAR(1) NOT NULL
, PRIMARY KEY (computerType, computerID)
, CHECK (computerType IN ('D', 'L'))
) ;

CREATE TABLE Laptop
( computerID INT NOT NULL
, computerType CHAR(1) NOT NULL
, battery WHATEVER
, PRIMARY KEY (computerType, computerID)
, CHECK (computerType = 'L')
, FOREIGN KEY (computerType, computerID)
    REFERENCES Computer (computerType, computerID)
) ;

CREATE TABLE Desktop
( computerID INT NOT NULL
, computerType CHAR(1) NOT NULL
, monitor WHATEVER
, PRIMARY KEY (computerType, computerID)
, CHECK (computerType = 'D')
, FOREIGN KEY (computerType, computerID)
    REFERENCES Computer (computerType, computerID)
) ;

and using deferred foreign key constraints on nullable columns:

CREATE TABLE Computer
( computerID INT NOT NULL
, laptopID INT NULL                                    -- these 2 columns 
, desktopID INT NULL                                   -- are NULLable
, PRIMARY KEY (computerID)
, UNIQUE (laptopID)
, UNIQUE (desktopID)
, CHECK ( laptopID IS NOT NULL AND desktopID IS NULL   -- but only one of them 
       OR desktopID IS NOT NULL AND laptopID IS NULL   -- is NULL
        )
, CHECK (laptopID = computerID)                        -- and the non-NULL one is
, CHECK (desktopID = computerID)                       -- equal to computerID
) ;

CREATE TABLE Laptop
( laptopID INT NOT NULL
, battery VARCHAR(20) NOT NULL
, PRIMARY KEY (laptopID)
, FOREIGN KEY (laptopID)
    REFERENCES Computer (computerID)
) ;

CREATE TABLE Desktop
( desktopID INT NOT NULL
, monitor VARCHAR(20) NOT NULL
, PRIMARY KEY (desktopID)
, FOREIGN KEY (desktopID)
    REFERENCES Computer (computerID)
) ;

ALTER TABLE Computer
  ADD CONSTRAINT Laptop_Computer_FK
  FOREIGN KEY (laptopID)
    REFERENCES Laptop (laptopID)
    DEFERRABLE INITIALLY DEFERRED
, ADD CONSTRAINT Desktop_Computer_FK 
  FOREIGN KEY (desktopID)
    REFERENCES Desktop (desktopID)
    DEFERRABLE INITIALLY DEFERRED ;

Tested in Postgres, at SQL-Fiddle

Which items are table entities and which are fields

It sounds like the main issue here is that the attributes of certain entities (such as characters) can change over time, and you want to be able to track all the changes.

One way to do this might be to think of it as a data versioning problem.

One possible schema might look like this:

book
----
  ID
  name
  sequence_number (which book it is in the series)
  start_date (use this to track the dates that book covers)
  end_date (use this to track the dates that book covers)

character_base
--------------
  id (PK)
  first_book (FK to book.id)

character
--------
  id
  base_id (FK to character_base.id)
  version_number (in your case, this is probably optional)
  firstname
  lastname
  nickname
  hometown
  notes
  birthdate
  rank (fk to rank reference table: rank.ID)
  start_date (the begining of the time range for which this version is valid)
  end_date (the end of the time range for which this version is valid)

rank
----
  ID
  name

Now, whenever you create a new "version" of your character, you need to create a new record in character, but make sure you point the base_id to the same value as the previous value. Also remember to always populate start_date and end_date and ensure there are no gaps in the range.

To find out which version of a character is present in which books, you'd have to join character to book with a condition such that the character's version start and end date falls within a book's start and end date.

You can apply the versioning pattern for character to other entities that can be versioned, such as platoon or company if their attributes (such as name) need to be tracked over time.

Best Answer

Related Solutions

ER Diagram translation to tables

Which items are table entities and which are fields

Related Question