What does it mean that data is valid and accurate according to database integrity?
Data validity and accuracy according to database integrity
database-designterminology
Related Solutions
Etymology
"Collation" is probably best defined on etymology.com,
late 14c., "act of bringing together and comparing,"
It hasn't changed at all in the past 600 years. "collate" means "to bring together" and whether it's the ordering of a book, chronologically or otherwise in a database, it's all the same.
Computer Science
As it applies to Computer Science, what we're doing is separating the storage mechanism from the ordering. You may have heard of ASCII-betical. That refers to a comparison based on the binary encoding of ASCII characters. In that system, storage and collation are two in the same. If the ASCII-standardized "encoding" ever changed the "collation" (order) would change too.
POSIX started to break that with LC_COLLATE
. But as we move into Unicode a consortium emerged to standardize collations as well: ICU.
SQL
In the SQL spec there are two extensions to provide collations,
F690
: “Collation support:Includes
collation name
,collation clause
,collation definition
anddrop collation
.F692
: Extended collation support,Includes attaching a different default collation to a column in the schema.
Essentially these provide the ability to CREATE
and DROP
collations, to specify them for operators and sorts, and to define a default for columns.
For more information on what SQL has to offer check out,
- PostgreSQL documentation on collations
- Oracle Linguistic Sorting and Matching
- MS SQL Server, Collation and Unicode Support, and Collations
- Character Sets and Collations in MySQL
Examples
Rather than pasting a limited example, here is the PostgreSQL test suite it's pretty extensive. Check out at least the first link and look for 'Türkiye' COLLATE "tr-x-icu" ILIKE '%KI%' AS "false"
collate.icu.utf8.out
collate.linux.utf8.out
collate.out
Probably not worth checking out, but I placed it here for completeness, boring POSIX checks.
Do not know if this will be useful to you because it requires quite a few changes, but the problem is interesting, so I'll try.
These would be the major changes
- Using the tree closure instead of the adjacency list for the reference hierarchy. The closure table contains paths form each parent to all of it descendants, so all possible parent-child combinations are exposed.
Note that with the tree closure, each ancestor node points to itself as a descendant, meaning that in CaseProperty
recursion stops on ID = ParentID
instead on ParentID is NULL
It is not clear to me is a parent allowed to be any ancestor or just the one first step up. The closure table exposes ancestor and all descendants, so Level Difference
is added to the TreeClosure
, which is sub-typed as AllowedCombos
for LevelDifference in (0,1)
.
Propagating AK
{PropertyID, PropertyTypeID}
instead of justPropertyID
Using composite key in
CaseProperty
Here are main constraints from the model to clarify relationships (you may need to modify syntax)
ALTER TABLE Property ADD
CONSTRAINT PK_PR PRIMARY KEY (PropertyID)
, CONSTRAINT AK1_PR UNIQUE (PropertyID ,PropertyTypeID)
, CONSTRAINT FK1_PR FOREIGN KEY (PropertyTypeID)
REFERENCES PropertyType(PropertyTypeID)
;
ALTER TABLE TreeClosure ADD
CONSTRAINT PK_TC PRIMARY KEY (AncestorID ,DescendantID ,AncestorTypeID ,DescendantTypeID)
, CONSTRAINT FK1_TC FOREIGN KEY (AncestorID ,AncestorTypeID)
REFERENCES Property(PropertyID ,PropertyTypeID)
, CONSTRAINT FK2_TC FOREIGN KEY (DescendantID ,DescendantTypeID)
REFERENCES Property(PropertyID ,PropertyTypeID)
;
ALTER TABLE CaseProperty ADD
CONSTRAINT PK_CP PRIMARY KEY (CaseID, PropertyID, PropertyTypeID)
, CONSTRAINT FK1_CP FOREIGN KEY (CaseID)
REFERENCES Case(CaseID)
, CONSTRAINT FK2_CP FOREIGN KEY (PropertyID ,PropertyTypeID)
REFERENCES Property(PropertyID ,PropertyTypeID)
, CONSTRAINT FK4_CP FOREIGN KEY (ParentCaseID ,ParentPropertyID ,ParentPropertyTypeID)
REFERENCES CaseProperty(CaseID ,PropertyID ,PropertyTypeID)
, CONSTRAINT FK5_CP FOREIGN KEY (ParentPropertyID ,PropertyID , ParentPropertyTypeID ,PropertyTypeID)
REFERENCES AllowedCombos(AncestorID ,DescendantID , AncestorTypeID ,DescendantTypeID)
;
Related Question
- Sql-server – What process would you use to determine references on a database that has no integrity
- Database System vs DBMS – Understanding the Difference
- Data Integrity – Difference Between Integrity and Consistency
- Association Tables and Data Integrity
- Why is data integrity not possible when using denormalization
Best Answer
At the theoretical level, a domain can be represented by a table of values. (At the theoretical level, because the number of values for, say, the domain of non-negative integers is infinite.) Valid values come from that domain.
In a SQL database, you establish valid values using some combination of appropriate data types and constraints. For example, you might declare "employee_id" to be an integer, and restrict the range of integers with a) a foreign key constraint to a table of 'n' integers or b) a check constraint.
Accurate means that, of all the possible valid values, a user has chosen values that correspond to the entity's state or description in the real world. In this sense, user can be either a human or a program.
For example, let's say a book in a library can have any of these dispositions: "checked out", "recently returned" (not yet in the stacks), "in stacks", "being repaired", and "lost". If I check out a book and the database stores it's status as "checked out", then the database is accurate with respect to the disposition of that copy. (At my library, checkouts and returns are done by computers, not by people.)
So data integrity requires the cooperation of the database (to allow only valid values) and users (to enter the right values from all the possible ones).