Mysql – Best practice for other Table Data in an Order

database-designMySQL

At the moment I try to create a good Database structure for storing all Data belonging to an Order/Inquiry.

I use MySQL and Python at the moment and for now it isn't an real DB, just learning and testing!

The Order contains Information from the following Tables of my DB:

  • Customer

  • Address

  • Person

  • Product –> Copied to an Order Product

My questions are:

  • With the Order Product i copy all Product Data to an new Table and "attach" it to the OrderID, that was my best guess to make the Order Products independent from changes to the product later on. Something like a new Price for the Product which should not affect old orders to be changed. Is that a good way to achieve this?

  • I think i have the same problem with Addresses, Persons and also the Customer it self. If a customer changes its address i dont want to transfer that change to old orders. What is the best way to store such data independent or is there any other way of doing this?

  • We have Idicators for our IDs to make it easier to recognize them. An ID for an Address Looks like this "AD0000001" or for an Contact Person "CP0000001" i noticed that i have to seperate the number and the Characters to do auto_increment in my DB. Is it a good practice to save the indicating Characters as an Default Value to my Table? And put them together in my Select Queries?

If i have missed some information or anyhing else, let me know! I will add it to my post!

Best Answer

It would be helpful to see your actual SQL scripts; an English description can be ambiguous. It sounds like you have all the fields from your Address, Customer, and Product tables in your Orders table, to capture a snapshot of the data as it was when the order was placed. Is that right? If so, that's a valid approach, but it will take up a lot of space. Space is cheap, but performance is not: keep your tables lean.

First, do you not separate Orders and OrderDetails? Or is that what OrderProduct is meant to do? You should only need customer and address information in the header table. Your detail table would then typically be at a level of specific line items, with quantities and prices. In some cases, a logical order might be shipped to multiple locations, or at separate times, and you'd need Address, DateShipped, DateReceived, DateReturned, etc., at the line item level.

Second, changing values for customers and addresses. It's good that you're giving this some thought. The topic you need to read up on is "slowly-changing dimensions:" today, customer #123, Mr. Smith, lives on 101 Main Street, next year she's Ms. Smith and she lives on 1st Avenue, but she's still customer #123. There are many approaches to handling this problem. Typically, rather than recording this information in the Orders table, you'd record each version of the customer as a record in the Customers table, with a persistent customer number (123) and a version number (1 for Main Street, 2 for 1st Avenue), something like this:

 CustomerSK CustomerID VersionID FirstName LastName Address DateEffective DateExpired IsCurrent
 ---------- ---------- --------- --------- -------- ------- ------------- ------------ ---------
 42942      123        1         Robert    Smith    10001   2018-12-07    2019-12-22  0
 42943      123        2         Roberta   Smith    10002   2019-12-23    NULL        1

 OrderID CustomerSK DatePlaced
 ------- ---------- ----------
 103101  42942      2018-07-19
 103102  42943      2019-12-24

You may have a calculated field which concatenates customer ID and version ID to make 123001 and 123002, or create a new auto-increment field for the table, with no relation to the customer IDs. This is known as a "synthetic key", and is labeled CustomerSK in the sample above.

Your Orders table would either have two fields, CustomerID and VersionID, or a single CustomerSK field. You use this field, or pair of fields, to join to the Customers table and get the name and address as of when the order was placed. You could join to Customers again, on C1.CustomerID = C2.CustomerID AND C2.IsCurrent = 1, to get the customer's name and address right now.

IsCurrent can be a calculated field on DateExpires IS NULL. Alternatively, you may prefer a far future value like 2100-12-31 for customer records which have not yet expired. Be sure to have a unique constraint on { CustomerID, VersionID } and on { CustomerID } WHERE IsCurrent = 1.

In a real example, you'd have multiple addresses for each customer. The same principle holds: when an address changes, record the new value, mark the old value as expired, and mark the new value as current.

Third, concatenating a text prefix to a numeric code is not going to be ideal for IDs. You should distinguish between people-friendly IDs and database IDs. The former may be alphanumeric, to help people recognize them, but data integrity and joins should be based on the latter, for best performance. It's easy enough to display a prefix before the numeric ID, if necessary, in GUIs. Consider dates: you always store them using a DATE data type, never as a number or text, but you can format them however the user likes when displaying data.

I'm not sure what distinction you're drawing between customers and people. Pedantically, it may seem strange to put prospects in a Customers table if they've never actually ordered anything, so you may prefer to call it People or Contacts, but what information would you store for a customer that you wouldn't for anyone else? LastOrderID or DateLastOrder, perhaps? Those may be sparse columns in a table which includes prospective customers, but NULLs don't take up much space.