PostgreSQL – Designing Database Structure for Companies and Stock Owners

database-designforeign keypostgresqlprimary-key

I'm working on a personal project where I want to make a database that holds the information for every owner of every public company. Say for example "Sears INC" and the program would get the information for every owner in "Sears INC". That is the idea. But I am having a really hard time trying to structure this database. I only have limited experience and knowledge about this subject so any guidance would be greatly appreciated.

Now I was first thinking about making a table called companies, and give them all unique ID's and make a table for every company. Then inside those tables would be all the owners with their unique ID. That would link to their information. I have tried to visualize this here:

here

Now I know that creating a table for every company would be tedious but I cant think of any other way of doing this. As you all know each company might have the same owner and therefore it would make sense of using Owner_ID to identify each owner.

I have all the other data in a CVS file that I could easily import in PostgreSQL after I have structured the database.

Any help would be greatly appreciated.

TL:DR; Creating a database of stock owners that will be searchable by company name, and need help to structure the database for maximum efficacy.

A person can only have one phone and one address. Same goes for the company, because I have made a script that finds the phone based on this information. The way I differentiate between knowing if a entity is a company or person is by using a column called F_Org which will be larger than 4 digits for companies. This is all handled by the script.

Best Answer

One of the topics that I commonly touch in my database design answers is that before (a) thinking about the technical aspects of a relational database —e.g., the declaration of tables, columns, etc.— it is highly recommendable to first (b) define all the characteristics of the business context of interest with precision. This implies identifying the entity types, their properties and every significant association that exists among them.

The group of definitions about said elements is commonly referred to as as business rules, which make up the applicable conceptual model.

Also, I esteem that there are other important subjects about (1) data management and (2) relational design and manipulation that (3) may help to clarify some aspects you brought up in the question and via comments as well. I will integrate all these points as the answer goes progressing.

Conceptual level

Let us start reevaluating the relevant conceptual level of abstraction. In this regard, it is important to formulate a series of statements that describe the busines environment. So, in your specific case, keeping the business rules deliberately as simple as possible:

  • A Company is primarily identified by its Id
  • A Company is alternately identified by its Name
  • A Company is established on its FoundingDate
  • A Person is primarily identified by his or her Id
  • A Person is alternately identified by the combination of his or her FirstName, LastName, GenderCode, BirthDate and BirthPlace
  • A Person keeps exactly-one Address
  • A Person uses exactly-one PhoneNumber
  • A Company is owned by one-to-many People
  • A Person owns zero-one-or-many Companies

Illustrative IDEF1X model

Then based on the business rules formulated above, one can create a deliberately and relatively simple IDEF1Xa model like the one shown in Figure 1, in order to have a graphical device that consolidates most of the significant features in a single resource:

Figure 1 - Companies and Stock Owners Simplified IDEF1X Model

a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the early theoretical works authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.

As demonstrated, in an IDEF1X model we can start including technical considerations like the indications of properties or attributes that must be constrained via PRIMARY, ALTERNATE and FOREIGN KEY definitions (for brevity: PK, AK and FK, respectively) at the logical level.

The last two of the business rules previously discussed show that there is a many-to-many (M:N) association or relationship between the entity types Company and Person, which reveals the existence of an associative entity type that I called CompanyOwner (which might very well be denominated CompanyShareHolder or something that fitts better the terminology used in the business domain).

One of the properties of paticular importance in the present modeling excercise is the one I named StockShareQuantity (portraying a sum of Shares), because it comes about exclusively in the context of a CompanyOwner association; in this way, it does not belong to a Person nor to a Company, but to the connection that may arise between these two separate entity types.

As specified, each CompanyOwner occurrence or instance is identified by the combination of its CompanyId and OwnerId values, so these properties are highlighted as a composite PK in the entity type depiction. The CompanyOwner.CompanyId property is distinguished with a FK that points to Company.CompanyId, while CompanyOwner.OwnerId is noted with a FK that makes a reference to Person.PersonId.

Aiming to portray a slighlty more elaborate Person entity type, I decomposed Person.Name into FirstName and LastName, and I included the columns GenderCode, BirthDate and BirthPlace. Together, the combined values of all those properties are typically used to identify a Person in certain business scenarios, but you may simply be interested in keeping track of the Person.FullName, if that meets the requirements of your data usage, so you do not have to follow the same approach concerning the particular database under discussion.

Expository logical SQL-DDL structure

Subsequently, it is relatively more easy to declare a logical structure by means of the data definition language supplied by the database management system, in this case PostgreSQL, just like exemplified below:

-- You have to determine which are the most fitting 
-- data types and sizes for all the table columns 
-- depending on the business context characteristics.

-- Also, you should make accurate tests to define the
-- most convenient physical implementation settings; e.g.,
-- a good INDEXing strategy based on query tendencies.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions.

CREATE TABLE Company ( -- Stands for an independent entity type
    CompanyId       INT       NOT NULL, -- You may like to set it with the SERIAL type to retain system-assigned and system-generated surrogate values.
    Name            TEXT      NOT NULL,
    FoundingDate    DATE      NOT NULL,
    Etcetera        TEXT      NOT NULL,
    CreatedDateTime TIMESTAMP NOT NULL DEFAULT NOW(), -- You may also configure it so that the DEFAULT value is set by the CURRENT_TIMESTAMP function, if appropriate. 
    --
    CONSTRAINT Company_PK PRIMARY KEY (CompanyId),
    CONSTRAINT Company_AK UNIQUE      (Name) -- Single-colum ALTERNATE KEY.
);

CREATE TABLE Person ( -- Denotes an independent entity type.
    PersonId        INT       NOT NULL, -- You may like to set it with the SERIAL type to retain system-assigned and system-generated surrogate values.
    FirstName       TEXT      NOT NULL,
    LastName        TEXT      NOT NULL,
    GenderCode      TEXT      NOT NULL,
    BirthDate       DATE      NOT NULL,
    BirthPlace      TEXT      NOT NULL,  
    Address         TEXT      NOT NULL,   
    PhoneNumber     TEXT      NOT NULL,     
    CreatedDateTime TIMESTAMP NOT NULL DEFAULT NOW(), -- You may also configure it so that the DEFAULT value is set by the CURRENT_TIMESTAMP function, if appropriate.
    --
    CONSTRAINT Person_PK PRIMARY KEY (PersonId),
    CONSTRAINT Person_AK UNIQUE      ( -- Composite ALTERNATE KEY.
        FirstName,
        LastName,
        GenderCode,
        BirthDate,
        BirthPlace
    )
);

CREATE TABLE CompanyOwner ( -- Represents an associative entity type or M:N association. Attaching an extra column to hold system-generated and system-assigned surrogate values to this table is superfluous.
    CompanyId          INT       NOT NULL,
    OwnerId            INT       NOT NULL,
    StockShareQuantity INT       NOT NULL,
    Etcetera           TEXT      NOT NULL,  
    CreatedDateTime    TIMESTAMP NOT NULL DEFAULT NOW(), -- You may also configure it so that the DEFAULT value is set by the CURRENT_TIMESTAMP function, if appropriate. 
    --
    CONSTRAINT CompanyOwner_PK          PRIMARY KEY (CompanyId, OwnerId), -- Composite PRIMARY KEY.
    CONSTRAINT CompanyOwnerToCompany_FK FOREIGN KEY (CompanyId)
        REFERENCES Company (CompanyId),
    CONSTRAINT CompanyOwnerToPerson_FK  FOREIGN KEY (OwnerId)
        REFERENCES Person (PersonId),
    CONSTRAINT StockShareQtyIsValid_CK  CHECK       (StockShareQuantity >= 0) -- Appears to be required.
);

As expounded, in this deliberately and relatively simple logical structureb (which is very similar to the one portrayed in your diagram although with a few important differences):

  • each base table denotes an individual entity type, which prevents ambiguities;
  • each column stands for a single property of the respective entity type;
  • a specific data type is fixed for each column in order to ensure that all the values it retains belong to a particular set (that you will have to adapt to adapt to your exact necessities, choosing the most fitting types from the options provided by PostgreSQL), be it INT, TIMESTAMP, TEXT, etc.; and
  • multiple constraintsc, d are set up (declaratively) in order to guarantee that the rows retained in all the tables comply with the rules determined in the conceptual model.

b I have uploaded a db<>fiddle and a SQL Fiddle, running on PostgreSQL 9.6, that contain the DDL structure and constraints I defined along with sample data so that you can test it “in action”.

c Since (i) the structure of the data —i.e. the tables, columns and types— and (ii) the constraints —e.g., PKs, FKs and CHECKs— imposed on such a structure to ensure that it accepts only valid data are (iii) two diferent —but related— factors of the configuration of a relational database, I suggest (iv) placing the structure declarations apart from the constraint declarations —though the SQL language and the dialect supplied by PostgreSQL permit declaring “in-line” column constraints, if you wish to opt for that option—. In fact, it would be even better to move the constraint definitions outside of the CREATE TABLE … (…); statements, so this point deserves thorough evaluation.

dAn exception that I make regarding the separation of structure from constraints is that I in effect fix NOT NULL in “in-line” column declarations because it is a special kind of constraint. To keep things as brief as possible, permitting NULL marks is an approach —considered controversial on theoretical grounds— introduced by the SQL language designers to try manage the Missing Information Problem, which is a rather broad subject. NULL marks, in turn, imply dealing with another problem known as Three-Valued Logic. According to the relational theory, a table with one or more columns that enclose NULL marks (1) does not represent a mathematical relation —said marks are indicators about the absence of a value, then they are not domain values—, therefore (2) such a table is not going to “behave” as a relation when it is operated upon. Consequently, while having columns maintaining NULL markers is a possibility, I recommend that you study trustworthy material about these topics so that you can make an informed decision knowing all the implications and methods with respect to the management of Missing Data.

The Person.Address Column

In case that you are interested in manipulating certain encoded parts of the Address column, let us say PostCode, you might like to evaluate decomposing it in several columns and moving the pertinent information to a separate table that, perhaps, would be connected with the Person table via a column with a FK constraint.

Declaring a table for each Company?

You brought up the following thoughts in comments:

As I have stated the design will change because […] there will be multiple companies and people can own multiple companies and therefore show up twice. My original question was how to design this. I thought about making a table for each company which I think would be my last resort but I still have no other guidance or tips on how to do this.

No, you should not create a table for each Company, because that would be quite suboptimal.

In that respect, it can be said that every row kept in each of the previously presented base tables is an assertion about a certain conceptual-level entity that belongs to a certain type, i.e., either a Company or a Person or a CompanyOwner.

Hence, a row about a given Company is held only once in the Company table and a row about a given Person is maintained only once in the Person table. Then, since a Person can own zero, one or more Companies, each row about an exact connection that a Person has with respect to an individual Company is enclosed only once in the CompanyOwner table.

A row in the CompanyOwner table does not contain the whole information about a Person, it only contains (1) one value in the OwnerId column —constrained as a FK— that makes a reference to (2) one value retained in the Person.PersonId column —constrained as the PK—. A certain CompanyOwner.OwnerId value may be repeated multiple times, and so may happen with a CompanyOwner.CompanyId value too, but that is not problematic. As exemplified, the possibility of repeating the same combination of (CompanyId, OwnerId) values is prevented by way of the CompanyOwner composite PK definition.

Derivable information

A set of base tables is definitely not a fixed structure it when it comes to a relational database because, apart from being susceptible to extensions and adaptations, the base tables help in deriving new tables that are not configured at design time.

Let us suposse that you have populated your database tables with the sample data via the following INSERT operations:

INSERT INTO Company 
    (CompanyId, Name, FoundingDate, Etcetera)
VALUES
    (1748, 'Database Modeling Inc.', '1985-06-30', 'Foo'),
    (1750, 'Application Programming Co.', '1987-10-14', 'Bar');

INSERT INTO Person 
    (PersonId, FirstName, LastName, BirthDate, BirthPlace, GenderCode, Address, PhoneNumber)
VALUES
    (1, 'Edgar', 'Codd', '1923-08-19', 'Fortuneswell, UK', 'M', 'IBM Research Laboratory K01/282, 5600 Cottle Road, San Jose, CA, USA', '01-800-17-50-17-50'),
    (2, 'Alan', 'Turing', '1912-06-23', 'Maida Vale, UK','M', 'National Physical Laboratory, Hampton Road, Teddington, TW11 0LW, England', '01-800-17-48-17-48'),
    (3, 'Grace', 'Hopper', '1906-12-09', 'New York City, USA', 'F', 'Navy’s Office of Information Systems Planning, USA.', '01-800-17-50-17-50'),   
    (4, 'Diego', 'Velázquez', '1599-06-06', 'Seville, Spain', 'M', 'Palacio Real, Madrid, Spain', '01-800-17-50-17-50'),
    (5, 'Michelangelo', 'Buonarroti', '1475-03-06', 'Caprese, Italy', 'M', 'Sistine Chapel, Vatican City State', '01-800-17-50-17-50'); 

INSERT INTO CompanyOwner 
    (CompanyId, OwnerId, StockShareQuantity, Etcetera)
VALUES
    (1748, 1, 2500, 'U'),
    (1750, 1, 2500, 'V'),
    (1750, 2, 8000, 'W'),
    (1750, 3, 3580, 'X'),
    (1748, 4, 12899, 'Y'),
    (1750, 5, 12899, 'Z'); 

After that, if you want to, e.g., produce a table that comprises data about the Owners of all the Companies, you can declare a VIEW like the following one:

CREATE VIEW CompanyAndOwner AS
    SELECT C.CompanyId,
           C.Name AS CompanyName,
           P.PersonId,
           P.FirstName,
           P.LastName,
           P.BirthDate,
           P.BirthPlace,
           P.GenderCode,
           P.Address,
           P.PhoneNumber,
          CO.StockShareQuantity,
           P.CreatedDateTime
        FROM Person P
        JOIN CompanyOwner CO
          ON CO.OwnerId = P.PersonId
        JOIN Company C
          ON C.CompanyId = CO.CompanyId;

Then you can express successive operationsc that SELECT directly FROM that VIEW; e.g.:

SELECT *            
  FROM CompanyAndOwner 
 WHERE CompanyId = 1750;

and

SELECT CompanyName,
       FirstName AS OwnerFirstName,
       LastName  AS OwnerLastName,
       StockShareQuantity
  FROM CompanyAndOwner 
 WHERE CompanyId = 1750; 

and

SELECT * 
  FROM CompanyAndOwner 
 WHERE CompanyId = 1748;

and

SELECT CompanyName,
       FirstName AS OwnerFirstName,
       LastName  AS OwnerLastName,
       StockShareQuantity
  FROM CompanyAndOwner 
 WHERE CompanyId = 1748;

Or you can SELECT directly FROM a base table as well:

SELECT COUNT(OwnerId) AS OwnedCompaniesQuantity
   FROM CompanyOwner
  WHERE OwnerId = 1;

etc.

e All the data manipulation operations included here are comprised in the db<>fiddle and the SQL Fiddle previously linked to, so that you can analyze the result sets that they produce.

Company Address(es) and Phone Number(s)

In the series of deliberations that we engaged in via comments I asked you whether you interested in retaining Addresses and PhoneNumbers that belong to Companies, and your response was the following one:

A person can only have one phone and one address. Same goes for the company, because I have made a script that finds the phone based on this information. The way I differentiate between knowing if a entity is a company or person is by using a column called F_Org which will be larger than 4 digits for companies. This is all handled by the script.

I am not sure if that means that the Address(es) and Phone Number(s) of a Company are stored in an application program component (perhaps a file with records and fields, or something similar) but, if you are in fact handling that information, you should make use of the right tool for the job, i.e., include those aspects in the database structure and take advantage of the instruments provided by the database management system (i.e., PostgreSQL) so that you can administer the concerning data in an optimal manner (e.g., by virtue of logical-level operations that are based on the relational algebra and declarative constraints, supported at the physical level by a powerful set processing engine).

Feasible DDL structure extension representing a supertype-subtype relationship

Thus, in case that you determine that, in your business domain

  • the same Address occurrence can be kept by more than one Person and/or Company instance,

and

  • the same PhoneNumber occurrence can be used by more than one Person and/or Company instance,

you may like to analyze the diagrams I included in this answer for a question on a scenario that is very similar to the one under consideration (with, among other features, the Party supertype representing either the Organization or Person subtypes, People playing the Role of Owners of Organizations, and Addresses and PhoneNumbers linked with Organizations and People via the supertype), as they can be used as a reference for a tentative extension.

Related Question