That's a long question.
First off, my current project (I'm the database guy, there are MMO engine experts to deal with that) is a form of MMORPG based on an off-the-shelf engine. Volumes would be like Eve Online" or "World of Tanks" volumes.
Now for an orthogonal short answer:
- separate DB and Engine completely
Don't mix and match because of hardware optimisations
- hardware: DB and engine servers will be way different specs
- design your database normally
There is a whole lot more of course, but I'd suggest you're over-thinking the problem and shooting yourself in the foot. I'm simply applying the same techniques to my MMO that I used in Investment Banking because IMO most high volume systems should converge to the similar architecture
Ensuring that addresses are unique is not a trivial problem. You have to assume that people enter their addresses in slightly different ways, abbreviating some parts sometimes, or omitting optional information. Just checking for exact duplicates would probably fail pretty often.
You should consider whether trying to keep addresses unique is worth the effort and complexity.
Addresses of users and addresses of orders are pretty different concepts. A user can change their current address. An order should never be retroactively changed in that way, the address there should always be the delivery address.
You could consider making addresses immutable. Changing an address would always mean creating a new address. This means you can safely point orders to the common address table, as the addresses will never change.
If a user changes their address, you simply add a new address and link it to the user. The old address can be hidden in the UI, soft-deleted or just shown as an address history. But all orders could still reference the old address because it never changes.
- Is it practical, at scale, to scan an entire table to see if an Address exists before adding it?
That depends entirely on what you consider "at scale", it's not an issue with a few tens of thousands of rows but completely impractical with a billion rows.
- Is an index supporting a combination of eight columns reasonable (as in Option 1)?
B-Tree indexes are limited to something around 2700 characters. Indexing very large items is typically not a good idea. In Postgres 10, which isn't released yet, you could consider a hash index for this which has a fixed size as far as I understand and should have an advantage for large columns.
But I really think you're asking the wrong question here. In your application, you should know when an address is already in the DB because the user didn't enter it from scratch, but e.g. selected it from the list of addresses already associated with their account.
You don't actually need to enforce uniqueness in your DB, if I understand your problem correctly. And if you'd need to do that, you should always use a unique constraint, which automatically uses an index.
Best Answer
Business rules
We determined via comments that —regarding your business context and within the scope of your question— there are actually three different entity types of relevance, i.e.:
User
Job
Address
We also defined that there are two distinct relationships that involve said entity types, and you provided more details about such connections via your following comments:
Therefore, the significant business rules can be formulated as follows:
An Address houses zero-one-or-many Users
An Address locates zero-one-or-many Jobs
Logical model
So, I have derived a simplified IDEF1X1 logical model that depicts the aforementioned business rules in Figure 1:
In said logical model, you can see that I have represented the three discussed entity types separately, and established (a) the association between
Address
andUser
and (b) the conection betweenAddress
andJob
by means of the migration2 of theAddress
PRIMARY KEY (PK) to bothUser
andJob
, which enforces two different one-to-many (1:M) relationships, as your scenario description demands.Expository implementation
Consequently, I created an expository DDL structure based on the logical model presented above that can serve as a reference for a concrete implementation:
ACID Transactions
As in any relational database implementation, you should seriously consider using ACID TRANSACTIONS in order to protect the integrity and consistency of the data you are working with.
Making data retrieval easier with the use of a VIEW
If you want to access user and address data points directly from a single resource, it would be convenient to create a VIEW (i.e., a derived or virtual table) that displays columns that belong to the base tables named
UserProfile
andMyAddress
, e.g.:In this way, you can SELECT directly FROM the VIEW without having to write the JOIN clause and the condition every time you have to retrieve related information. Of course, you can employ a similar approach for job and address data.
Many-to-many relationship between User and Address
In case that you face the need to implement a many-to-many (M:N) relationship between the
Address
andUser
entity types —which would not be unusual in other contexts—, some small modifications to the business rules formulations would be required, as shown bellow:A User keeps zero-one-or-many Addresses
An Address is kept by zero-one-or-many Users
Thus, I drew a new logical model that portrays this situation in Figure 2:
As demonstrated, an associative entity that connects (a)
User
with (b)Address
is included, and I named itUserAddress
accordingly; its PK is made up of two different FOREIGN KEYs, and each of them makes a reference to the respective entity type.I coded an illustrative DDL structure with this alternative arrangement:
If the cardinality of the relationship among
Job
andAddress
also happens to be (or, who knows?, at some point in time becomes) M:N in your business domain, you can follow a method comparable to the one just detailed.Notes
1. Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) theoretical work authored by the originator of the Relational Model, i.e., Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown. It is worth noting that IDEF1X was formalized by way of first-order logic.
2. IDEF1X defines key migration as “The modeling process of placing the primary key of a parent or generic entity in its child or category entity as a foreign key”.