Single or multiple databases

database-design

I am in the progress of designing a database (or databases) for somebody and since I am a beginner I would love some advice.

The data is described as:

Customer1 (c1.type1, c1.type2, c1.type3)
Customer2 
Customer3
Customer4 (c4.type1, c4.type2)
Customer5
Customer6
...
Customer10

There will be around 10 types of customers, perhaps less.

Not all the customers have different sub-types. The type for customer1 for example, would be different than for other customers.

All of them would have general info included, for instance address, email, and phone number, along with other info.

Every customer generates an invoice. They pay different amounts since they buy different things.

We are predicting the tables may contain many millions of records.

Should I have multiple databases for each customer type?

Should I have one database with multiple tables for each customer type?

Should I have one database with one table for all customer types?

What about invoices? Separate database? Separate table? One invoices table for all invoices?

As of this moment I am inclined to go with one database with multiple tables for each customer type. The invoices would be in a separate table.

What do you think?

Best Answer

You seem to use the word "database" when you mean "table." One database for all data is best. Most modern databases (PostgreSQL, MySQL, Oracle, DB2, SQL Server, etc.) can handle a large amount of data in one database. It sounds like you might want to access all customer and/or all invoice information together, so keep them together in one database.

Data should be split into separate tables to normalize it as needed. Fully normalized data (each piece of information stored only once) is recommended by every database textbook I have read. Read about (or at least Google) "database normalization" to understand that concept well.

One normalized approach would be to have one customer table indexed by a customer id with all fields common to every customer. The customer table would also have a field identifying the customer type: customer1, 2, ... 10. A separate table would hold customer-1-type fields, indexed by customer id. Another table would have customer-2-type fields, etc. The application using the database would be designed to look for data in the customer-1 table if the customer-type is customer1, look for data in customer-2 table if customer-type is customer2, etc.

Presumably, the relationship between customer and invoice is one to many. A invoice table could have a foreign key linking to the customer table. If invoice data varied by customer-type, then that data could be placed in separate tables: invoice-customer-type-1, invoice-customer-2, etc. When gathering invoice data, find the customer id for the invoice from the invoice table, look up the customer-type from the customer table, then get data from the appropriate invoice-customer-# table.

Also, the tables in the database should be structured how the data will be used. For data entry and updating, a normalized structure is good. For printing bills or other reporting, you might want to create temporary de-normalized tables to speed the process. Creation of a temporary table can be faster because you avoid making multiple table joins every time a bill is printed.

Get a textbook and study database design, but in the meantime, I hope this helps.