Ms-access – How to allow part of a key to be null and enforce uniqueness of records with nulls

ms accessnullprimary-keyunique-constraint

I have an Access table called Engagement with four fields:

Emp_id, Year, Week, Act_id

It records when an employee is/was engaged in an activity.
Each field is Number and each makes up part of the composite primary key.
The semantics of the application was that each entry (each Emp_idYearWeekAct_id combination) must be unique.
That is, while an employee will generally have different year-weeks and different activities, an employee can sometimes engage in the same activity on different year-weeks or even engage in different activities in the same year-week. Nulls were not allowed. Everything worked.

Now I need to expand/modify the semantics to allow unknown – or more aptly, not disclosed – Year-Week values associated with any given employee-activity.
Of course, attempting to enter a row with empty Year-Week results in
"Index or primary key cannot contain a Null value".
So I need a change to the table design.

One thing I tried was to convert the primary key index into a non-primary index by turning Primary off (and leaving Unique on) in the Indexes window.
This correctly prevents duplicate records where Year-Week values are non-empty — but it allows duplicate records where Year-Week are empty.

For example, using the above non-primary, unique index, the following data are allowed:

Emp_id  Year  Week  Act_id
     7  2014    12      31  } Same activity,
     7  2015    22      31  } different dates.
     7  2015    33      32
     7  2015    40      33  } Same dates,
     7  2015    40      34  } different activities.
     7  2016     2      36
     7                  38  } Different activities,
     7                  39  } undisclosed dates.

And none of the following additions are subsequently allowed:

Emp_id  Year  Week  Act_id
     7  2014    12      31  } Both records are 
     7  2015    33      32  } duplicates of above.

So far so good (behavior matches requirements). However, both the following additions are subsequently allowed, but should not be:

Emp_id  Year  Week  Act_id
     7                  38  } Both records are
     7                  39  } duplicates of above.

Why is that?

What is a good way around this problem of, on the one hand, allowing something equivalent to null for Year and Week and, on the other hand, constraining each Emp_idYearWeekAct_id combination to be unique?

I can think of two other (untried) solutions:

  1. Choose an equivalent-to-null numeric value, such as zero or -1 and somehow explain this to users.
  2. Convert the Year-Week field types from Number to Text and simply use "" (empty string) for null.

In your experience, what is a good solution in this kind of situation?

I'm aware of What's wrong with nullable columns in composite primary keys? and NULL value in multi-column primary key, which explain certain things but don't provide a solution.

Best Answer

The strongest advice I can give is to keep the primary key immutable unless it is absolutely unfeasible for your use-case. You appear to be describing 2 different datasets.

  1. User-Actions taken during a given Period ( defined by the composite key of Year-Week )
  2. User-Actions taken during an unknown Period

My initial impression is that the unknown period must needs be a different dataset. Surely if you don't know when an action has taken place you also do not yet know enough to move it from its staging environment to the table where User-Action-Period is the unique identifier. I don't know MS Access well enough to comment on the specifics of null keying in that environment, but I strongly recommend defining your data model for yourself in such a way that you know if you need to include the "equivalent-to-null" keyed records on the same table.

From your description, it seems that the problem occurs when attempting to insert multiple instances of a User-Action where the Period occurred is not known. Once again, if you do not know enough to uniquely identify the committed time of the User-Action, it is appropriate to stage the metrics you are recording about the same User-Action elsewhere until you can appropriately identify the relationship this User-Action has to the other similar User-Actions that occurred at another key-able time Period.

I see two paths forward:

If only one User-Action with "unknown" time-key is permissible

Example: The action is "pending" or "to-be-completed". The user will always be submitting data for a future / not-completed period.

Solution: Pick your favorite method of keying the time period as "equivalent-to-null" and make the key columns not-nullable. You may continue to store these datasets in the same table

Gotcha: If the user submits an action with a committed period while an unknown period exists for the same User-Action, you may run into a situation where data is committed where the period is known but other data already committed for an unknown period ( which happens to be the same one ) is left in the "unknown period" state and lost for that User-Action-Time record and the user doesn't immediately know why.

Multiple User-Actions with "unknown" time-keys are permissible

Example: Users are able to uniquely identify the same action type with a different identifier for an unknown Period.

Solution: You must define another way to key the data ( where the time the action occurs is a metric of the other multi-keyed record ). Allow the users to submit data to this separately stored dataset and merge it into your User-Action-Time period dataset using appropriate logic. It's more work, but there's no way around it if the datasets are bona-fide separate.