I have an Access table called Engagement
with four fields:
Emp_id, Year, Week, Act_id
It records when an employee is/was engaged in an activity.
Each field is Number and each makes up part of the composite primary key.
The semantics of the application was that each entry (each Emp_id
–Year
–Week
–Act_id
combination) must be unique.
That is, while an employee will generally have different year-weeks and different activities, an employee can sometimes engage in the same activity on different year-weeks or even engage in different activities in the same year-week. Nulls were not allowed. Everything worked.
Now I need to expand/modify the semantics to allow unknown – or more aptly, not disclosed – Year-Week values associated with any given employee-activity.
Of course, attempting to enter a row with empty Year-Week results in
"Index or primary key cannot contain a Null value".
So I need a change to the table design.
One thing I tried was to convert the primary key index into a non-primary index by turning Primary off (and leaving Unique on) in the Indexes window.
This correctly prevents duplicate records where Year-Week values are non-empty — but it allows duplicate records where Year-Week are empty.
For example, using the above non-primary, unique index, the following data are allowed:
Emp_id Year Week Act_id
7 2014 12 31 } Same activity,
7 2015 22 31 } different dates.
7 2015 33 32
7 2015 40 33 } Same dates,
7 2015 40 34 } different activities.
7 2016 2 36
7 38 } Different activities,
7 39 } undisclosed dates.
And none of the following additions are subsequently allowed:
Emp_id Year Week Act_id
7 2014 12 31 } Both records are
7 2015 33 32 } duplicates of above.
So far so good (behavior matches requirements). However, both the following additions are subsequently allowed, but should not be:
Emp_id Year Week Act_id
7 38 } Both records are
7 39 } duplicates of above.
Why is that?
What is a good way around this problem of, on the one hand, allowing something equivalent to null for Year and Week and, on the other hand, constraining each Emp_id
–Year
–Week
–Act_id
combination to be unique?
I can think of two other (untried) solutions:
- Choose an equivalent-to-null numeric value, such as zero or -1 and somehow explain this to users.
- Convert the Year-Week field types from Number to Text and simply use "" (empty string) for null.
In your experience, what is a good solution in this kind of situation?
I'm aware of What's wrong with nullable columns in composite primary keys? and NULL value in multi-column primary key, which explain certain things but don't provide a solution.
Best Answer
The strongest advice I can give is to keep the primary key immutable unless it is absolutely unfeasible for your use-case. You appear to be describing 2 different datasets.
User-Actions
taken during a givenPeriod
( defined by the composite key of Year-Week )User-Actions
taken during an unknownPeriod
My initial impression is that the unknown period must needs be a different dataset. Surely if you don't know when an action has taken place you also do not yet know enough to move it from its staging environment to the table where
User-Action-Period
is the unique identifier. I don't know MS Access well enough to comment on the specifics of null keying in that environment, but I strongly recommend defining your data model for yourself in such a way that you know if you need to include the "equivalent-to-null" keyed records on the same table.From your description, it seems that the problem occurs when attempting to insert multiple instances of a
User-Action
where thePeriod
occurred is not known. Once again, if you do not know enough to uniquely identify the committed time of theUser-Action
, it is appropriate to stage the metrics you are recording about the sameUser-Action
elsewhere until you can appropriately identify the relationship thisUser-Action
has to the other similarUser-Actions
that occurred at another key-able timePeriod
.I see two paths forward:
If only one User-Action with "unknown" time-key is permissible
Example: The action is "pending" or "to-be-completed". The user will always be submitting data for a future / not-completed period.
Solution: Pick your favorite method of keying the time period as "equivalent-to-null" and make the key columns not-nullable. You may continue to store these datasets in the same table
Gotcha: If the user submits an action with a committed period while an unknown period exists for the same
User-Action
, you may run into a situation where data is committed where the period is known but other data already committed for an unknown period ( which happens to be the same one ) is left in the "unknown period" state and lost for thatUser-Action-Time
record and the user doesn't immediately know why.Multiple User-Actions with "unknown" time-keys are permissible
Example: Users are able to uniquely identify the same action type with a different identifier for an unknown Period.
Solution: You must define another way to key the data ( where the time the action occurs is a metric of the other multi-keyed record ). Allow the users to submit data to this separately stored dataset and merge it into your
User-Action-Time
period dataset using appropriate logic. It's more work, but there's no way around it if the datasets are bona-fide separate.