MS Access 2010 – Defining Additional Index Fields in MS Access

database-designexcelms-access-2010

I am developing an access database that eventually will grow in size. For that reason scalability has to be taken in account.

My background: no official studies about databases.

  • I have read Access Database Design & Programming (O'Reilly) and I have taken multiple courses regarding databases (I keep studying and reading)
  • So far I've been working on developing a database from scratch with tables up to 50-60k rows (but will get bigger).
  • Advanced VBA user and comfortable with C#

I would like to confirm/deny some assumptions that might jeopardize the project.

  • Let say tblA has minimal superkey uniqueID. uniqueID is a text string of 7+ char. Is It necessary(recommended) to define a new field type numeric(integer) that will be used as the FK in other tables to generate the relationship?
    Let's say, in the future will be migrated to a SQL server. Will it help?
  • Applying the prior principle.
    When connecting the database with Excel, user needs to see information attribute that helps him to identify a record but when I send it back to the database I need to pass the numeric value that connects both records.
    Let's say we need to update a record in tblInventory that has relationships with tblEmpl/tblClients/tblPrices.

How should I handle the conversion of empl_name="Carlos" to emplID=1?

Maintaining a dictionary key/item?
Generating recorset to lookup the value in tblEmpl?
Is there any SQL {INSERT INTO;UPDATE} syntax to create INNER JOIN so it automatically converts it? (I haven't found anything that applies to SQL Access)

  • By using numeric ID to lookup values in table (when the table has 3/4/+ relationships), SELECT statement results in a nested INNER JOINs that look really ugly:

(ie: want to find values on tblC where the emplName is "MyName" and clientName is "MyClient". Need nesting tables to find those attributes)

SELECT tblC.ID,tblA.Name,tblB.Client
FROM tblC INNER JOIN (tblA INNER JOIN tblC ON tblA.ID = tblC.FK_A)
ON tblB.ID = tblC.FK_B
WHERE (((tblA.Name)="MyName",(tblB.Client)="MyClient"));

If I have 4 different relationships, It gets scary.

As I self-taught, I am never quite sure if I just read the wrong post in stackoverflow or this is the way it is and I just should bear with it.

Any light upon those questions will be greatly appreciated.

Best Answer

Your question is really: What are the benefits of surrogate keys?

A surrogate key is a (generally) integer identifier for a record in a database which is used as the primary key (generally) and used as a link in foreign key relationships.

The answers are found all over the place, Wikipedia has a great summary including the advantages and disadvantages, some of which you allude to: https://en.wikipedia.org/wiki/Surrogate_key

Addressing each of your points:

"Will it help?" Yes. From a performance perspective it will make a huge difference if your database grows to a large size because computers naturally use integers and comparing them is easy for them to do. As soon as you have to compare natural/business keys, computer processing time ramps up considerably. But this is only one advantage, immutability being probably the biggest. Read the above Wikipedia article to understand this.

"How should I handle the conversion of empl_name="Carlos" to emplID=1?" You shouldn't. You have a couple of options:

  1. Hide the key field from the user, they don't need to know about it.
  2. Let the user see the field, just tell them don't touch it (preferably lock it).

Users don't need to know there is stuff going on behind the scenes. They usually don't care either. There is no need to do the conversion if you hide the column or show it. Secondly the user should not be updating details on excel sheets offline unless you have a process for automatically reintegrating these into your application. Instead, build a data entry form for them to use with a search function and tell them to make updates to the database directly.

"SELECT statements look ugly" This is because MS Access makes SELECT statements look ugly NOT because they are ugly. A properly formatted SELECT statement looks fine:

SELECT 
    tblC.ID
    ,tblA.Name
    ,tblB.Client 
FROM tblC 
INNER JOIN tblA ON tblA.ID = tblC.FK_A
INNER JOIN tblB ON tblB.ID = tblC.FK_B
WHERE tblA.Name="MyName"
AND tblB.Client="MyClient";

Unfortunately if you are stuck with Access, you are stuck with ugly looking SELECTs. Having worked with MS Access a lot, you are better off just using the query designer in Access and forgetting about SQL Statements unless you are doing them in VBA code. In VBA code you are in control of how it looks so can design it as you wish.

Your book appears to include a chapter about database theory. This is important and you should re-read the parts about data theory (Chapter 4) as these should have answered your questions. If not, look for another book that hopefully will.