Data validity and accuracy according to database integrity

database-designterminology

What does it mean that data is valid and accurate according to database integrity?

Best Answer

At the theoretical level, a domain can be represented by a table of values. (At the theoretical level, because the number of values for, say, the domain of non-negative integers is infinite.) Valid values come from that domain.

In a SQL database, you establish valid values using some combination of appropriate data types and constraints. For example, you might declare "employee_id" to be an integer, and restrict the range of integers with a) a foreign key constraint to a table of 'n' integers or b) a check constraint.

Accurate means that, of all the possible valid values, a user has chosen values that correspond to the entity's state or description in the real world. In this sense, user can be either a human or a program.

For example, let's say a book in a library can have any of these dispositions: "checked out", "recently returned" (not yet in the stacks), "in stacks", "being repaired", and "lost". If I check out a book and the database stores it's status as "checked out", then the database is accurate with respect to the disposition of that copy. (At my library, checkouts and returns are done by computers, not by people.)

So data integrity requires the cooperation of the database (to allow only valid values) and users (to enter the right values from all the possible ones).

Etymology

"Collation" is probably best defined on etymology.com,

late 14c., "act of bringing together and comparing,"

It hasn't changed at all in the past 600 years. "collate" means "to bring together" and whether it's the ordering of a book, chronologically or otherwise in a database, it's all the same.

Computer Science

As it applies to Computer Science, what we're doing is separating the storage mechanism from the ordering. You may have heard of ASCII-betical. That refers to a comparison based on the binary encoding of ASCII characters. In that system, storage and collation are two in the same. If the ASCII-standardized "encoding" ever changed the "collation" (order) would change too.

POSIX started to break that with LC_COLLATE. But as we move into Unicode a consortium emerged to standardize collations as well: ICU.

SQL

In the SQL spec there are two extensions to provide collations,

F690: “Collation support:

Includes collation name, collation clause, collation definition and drop collation.
F692: Extended collation support,

Includes attaching a different default collation to a column in the schema.

Essentially these provide the ability to CREATE and DROP collations, to specify them for operators and sorts, and to define a default for columns.

For more information on what SQL has to offer check out,

PostgreSQL documentation on collations
Oracle Linguistic Sorting and Matching
MS SQL Server, Collation and Unicode Support, and Collations
Character Sets and Collations in MySQL

Examples

Rather than pasting a limited example, here is the PostgreSQL test suite it's pretty extensive. Check out at least the first link and look for 'Türkiye' COLLATE "tr-x-icu" ILIKE '%KI%' AS "false"

collate.icu.utf8.out
collate.linux.utf8.out
collate.out Probably not worth checking out, but I placed it here for completeness, boring POSIX checks.

Enforce data integrity with a recursive table

Do not know if this will be useful to you because it requires quite a few changes, but the problem is interesting, so I'll try.

These would be the major changes

Using the tree closure instead of the adjacency list for the reference hierarchy. The closure table contains paths form each parent to all of it descendants, so all possible parent-child combinations are exposed.

Note that with the tree closure, each ancestor node points to itself as a descendant, meaning that in CaseProperty recursion stops on ID = ParentID instead on ParentID is NULL

It is not clear to me is a parent allowed to be any ancestor or just the one first step up. The closure table exposes ancestor and all descendants, so Level Difference is added to the TreeClosure, which is sub-typed as AllowedCombos for LevelDifference in (0,1).

Propagating AK {PropertyID, PropertyTypeID} instead of just PropertyID
Using composite key in CaseProperty

enter image description here

Here are main constraints from the model to clarify relationships (you may need to modify syntax)

ALTER TABLE Property ADD
  CONSTRAINT PK_PR  PRIMARY KEY (PropertyID)

, CONSTRAINT AK1_PR UNIQUE (PropertyID ,PropertyTypeID)

, CONSTRAINT FK1_PR FOREIGN KEY (PropertyTypeID)
         REFERENCES PropertyType(PropertyTypeID)
;



ALTER TABLE TreeClosure ADD
  CONSTRAINT PK_TC PRIMARY KEY (AncestorID ,DescendantID ,AncestorTypeID ,DescendantTypeID)

, CONSTRAINT FK1_TC FOREIGN KEY (AncestorID ,AncestorTypeID) 
             REFERENCES Property(PropertyID ,PropertyTypeID)

, CONSTRAINT FK2_TC FOREIGN KEY (DescendantID ,DescendantTypeID)
             REFERENCES Property(PropertyID   ,PropertyTypeID)
;



ALTER TABLE CaseProperty ADD
  CONSTRAINT PK_CP PRIMARY KEY (CaseID, PropertyID, PropertyTypeID)

, CONSTRAINT FK1_CP FOREIGN KEY (CaseID)
                 REFERENCES Case(CaseID)

, CONSTRAINT FK2_CP FOREIGN KEY (PropertyID ,PropertyTypeID)
             REFERENCES Property(PropertyID ,PropertyTypeID)

, CONSTRAINT FK4_CP FOREIGN KEY (ParentCaseID ,ParentPropertyID ,ParentPropertyTypeID)
         REFERENCES CaseProperty(CaseID       ,PropertyID       ,PropertyTypeID)

, CONSTRAINT FK5_CP FOREIGN KEY (ParentPropertyID ,PropertyID   , ParentPropertyTypeID ,PropertyTypeID) 
        REFERENCES AllowedCombos(AncestorID       ,DescendantID , AncestorTypeID       ,DescendantTypeID)

;

Best Answer

Related Solutions

What does “collation” mean

Etymology

Computer Science

SQL

Examples

Enforce data integrity with a recursive table

Related Question