I'm working on a personal project where I want to make a database that holds the information for every owner of every public company. Say for example "Sears INC" and the program would get the information for every owner in "Sears INC". That is the idea. But I am having a really hard time trying to structure this database. I only have limited experience and knowledge about this subject so any guidance would be greatly appreciated.
Now I was first thinking about making a table called companies, and give them all unique ID's and make a table for every company. Then inside those tables would be all the owners with their unique ID. That would link to their information. I have tried to visualize this here:
Now I know that creating a table for every company would be tedious but I cant think of any other way of doing this. As you all know each company might have the same owner and therefore it would make sense of using Owner_ID to identify each owner.
I have all the other data in a CVS file that I could easily import in PostgreSQL after I have structured the database.
Any help would be greatly appreciated.
TL:DR; Creating a database of stock owners that will be searchable by company name, and need help to structure the database for maximum efficacy.
A person can only have one phone and one address. Same goes for the company, because I have made a script that finds the phone based on this information. The way I differentiate between knowing if a entity is a company or person is by using a column called F_Org which will be larger than 4 digits for companies. This is all handled by the script.
Best Answer
One of the topics that I commonly touch in my database design answers is that before (a) thinking about the technical aspects of a relational database —e.g., the declaration of tables, columns, etc.— it is highly recommendable to first (b) define all the characteristics of the business context of interest with precision. This implies identifying the entity types, their properties and every significant association that exists among them.
The group of definitions about said elements is commonly referred to as as business rules, which make up the applicable conceptual model.
Also, I esteem that there are other important subjects about (1) data management and (2) relational design and manipulation that (3) may help to clarify some aspects you brought up in the question and via comments as well. I will integrate all these points as the answer goes progressing.
Conceptual level
Let us start reevaluating the relevant conceptual level of abstraction. In this regard, it is important to formulate a series of statements that describe the busines environment. So, in your specific case, keeping the business rules deliberately as simple as possible:
Illustrative IDEF1X model
Then based on the business rules formulated above, one can create a deliberately and relatively simple IDEF1Xa model like the one shown in Figure 1, in order to have a graphical device that consolidates most of the significant features in a single resource:
a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the early theoretical works authored by the originator of the relational model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
As demonstrated, in an IDEF1X model we can start including technical considerations like the indications of properties or attributes that must be constrained via PRIMARY, ALTERNATE and FOREIGN KEY definitions (for brevity: PK, AK and FK, respectively) at the logical level.
The last two of the business rules previously discussed show that there is a many-to-many (M:N) association or relationship between the entity types
Company
andPerson
, which reveals the existence of an associative entity type that I calledCompanyOwner
(which might very well be denominatedCompanyShareHolder
or something that fitts better the terminology used in the business domain).One of the properties of paticular importance in the present modeling excercise is the one I named
StockShareQuantity
(portraying a sum of Shares), because it comes about exclusively in the context of aCompanyOwner
association; in this way, it does not belong to aPerson
nor to aCompany
, but to the connection that may arise between these two separate entity types.As specified, each
CompanyOwner
occurrence or instance is identified by the combination of itsCompanyId
andOwnerId
values, so these properties are highlighted as a composite PK in the entity type depiction. TheCompanyOwner.CompanyId
property is distinguished with a FK that points toCompany.CompanyId
, whileCompanyOwner.OwnerId
is noted with a FK that makes a reference toPerson.PersonId
.Aiming to portray a slighlty more elaborate
Person
entity type, I decomposedPerson.Name
intoFirstName
andLastName
, and I included the columnsGenderCode
,BirthDate
andBirthPlace
. Together, the combined values of all those properties are typically used to identify aPerson
in certain business scenarios, but you may simply be interested in keeping track of thePerson.FullName
, if that meets the requirements of your data usage, so you do not have to follow the same approach concerning the particular database under discussion.Expository logical SQL-DDL structure
Subsequently, it is relatively more easy to declare a logical structure by means of the data definition language supplied by the database management system, in this case PostgreSQL, just like exemplified below:
As expounded, in this deliberately and relatively simple logical structureb (which is very similar to the one portrayed in your diagram although with a few important differences):
b I have uploaded a db<>fiddle and a SQL Fiddle, running on PostgreSQL 9.6, that contain the DDL structure and constraints I defined along with sample data so that you can test it “in action”.
c Since (i) the structure of the data —i.e. the tables, columns and types— and (ii) the constraints —e.g., PKs, FKs and CHECKs— imposed on such a structure to ensure that it accepts only valid data are (iii) two diferent —but related— factors of the configuration of a relational database, I suggest (iv) placing the structure declarations apart from the constraint declarations —though the SQL language and the dialect supplied by PostgreSQL permit declaring “in-line” column constraints, if you wish to opt for that option—. In fact, it would be even better to move the constraint definitions outside of the CREATE TABLE … (…); statements, so this point deserves thorough evaluation.
dAn exception that I make regarding the separation of structure from constraints is that I in effect fix NOT NULL in “in-line” column declarations because it is a special kind of constraint. To keep things as brief as possible, permitting NULL marks is an approach —considered controversial on theoretical grounds— introduced by the SQL language designers to try manage the Missing Information Problem, which is a rather broad subject. NULL marks, in turn, imply dealing with another problem known as Three-Valued Logic. According to the relational theory, a table with one or more columns that enclose NULL marks (1) does not represent a mathematical relation —said marks are indicators about the absence of a value, then they are not domain values—, therefore (2) such a table is not going to “behave” as a relation when it is operated upon. Consequently, while having columns maintaining NULL markers is a possibility, I recommend that you study trustworthy material about these topics so that you can make an informed decision knowing all the implications and methods with respect to the management of Missing Data.
The Person.Address Column
In case that you are interested in manipulating certain encoded parts of the
Address
column, let us sayPostCode
, you might like to evaluate decomposing it in several columns and moving the pertinent information to a separate table that, perhaps, would be connected with thePerson
table via a column with a FK constraint.Declaring a table for each Company?
You brought up the following thoughts in comments:
No, you should not create a table for each Company, because that would be quite suboptimal.
In that respect, it can be said that every row kept in each of the previously presented base tables is an assertion about a certain conceptual-level entity that belongs to a certain type, i.e., either a Company or a Person or a CompanyOwner.
Hence, a row about a given Company is held only once in the
Company
table and a row about a given Person is maintained only once in thePerson
table. Then, since a Person can own zero, one or more Companies, each row about an exact connection that a Person has with respect to an individual Company is enclosed only once in theCompanyOwner
table.A row in the
CompanyOwner
table does not contain the whole information about a Person, it only contains (1) one value in theOwnerId
column —constrained as a FK— that makes a reference to (2) one value retained in thePerson.PersonId
column —constrained as the PK—. A certainCompanyOwner.OwnerId
value may be repeated multiple times, and so may happen with aCompanyOwner.CompanyId
value too, but that is not problematic. As exemplified, the possibility of repeating the same combination of(CompanyId, OwnerId)
values is prevented by way of theCompanyOwner
composite PK definition.Derivable information
A set of base tables is definitely not a fixed structure it when it comes to a relational database because, apart from being susceptible to extensions and adaptations, the base tables help in deriving new tables that are not configured at design time.
Let us suposse that you have populated your database tables with the sample data via the following INSERT operations:
After that, if you want to, e.g., produce a table that comprises data about the Owners of all the Companies, you can declare a VIEW like the following one:
Then you can express successive operationsc that SELECT directly FROM that VIEW; e.g.:
and
and
and
Or you can SELECT directly FROM a base table as well:
etc.
e All the data manipulation operations included here are comprised in the db<>fiddle and the SQL Fiddle previously linked to, so that you can analyze the result sets that they produce.
Company Address(es) and Phone Number(s)
In the series of deliberations that we engaged in via comments I asked you whether you interested in retaining Addresses and PhoneNumbers that belong to Companies, and your response was the following one:
I am not sure if that means that the Address(es) and Phone Number(s) of a Company are stored in an application program component (perhaps a file with records and fields, or something similar) but, if you are in fact handling that information, you should make use of the right tool for the job, i.e., include those aspects in the database structure and take advantage of the instruments provided by the database management system (i.e., PostgreSQL) so that you can administer the concerning data in an optimal manner (e.g., by virtue of logical-level operations that are based on the relational algebra and declarative constraints, supported at the physical level by a powerful set processing engine).
Feasible DDL structure extension representing a supertype-subtype relationship
Thus, in case that you determine that, in your business domain
and
you may like to analyze the diagrams I included in this answer for a question on a scenario that is very similar to the one under consideration (with, among other features, the Party supertype representing either the Organization or Person subtypes, People playing the Role of Owners of Organizations, and Addresses and PhoneNumbers linked with Organizations and People via the supertype), as they can be used as a reference for a tentative extension.