Relational Theory – Key of Relation Resulting from Full Outer Join

join;relational-theory

What is the key of the relation resulting from a full outer join? If there is no key, how is it said that the relational algebra is closed under its operators and the result of each operator is also a relation?

Best Answer

Short Answer

Closure in relational algebra has nothing to do with the concept of a key. A relational variable holds a relation so long as that relation has the following properties:

  • Tuples are not ordered - order holds no implicit meaning
  • Tuples are distinct - there are no duplicates
  • Attributes are not ordered - order holds no implicit meaning
  • Attributes are single valued - only one of whatever type it is

The basic operators of the relational algebra are indeed closed and thus when applied on relations always yield relations. Keys are irrelevant as by definition each relation has no duplicate tuples. Any operation that would result in duplicate tuples (for example the projection of a single attribute over the relation) by definition eliminate those duplicate tuples when providing the result. As you know SQL does not properly implement this.

Long Answer

In order to really understand the short answer, and to understand the addtional nuances that "outer join" throws into the equation, we need to delve into the foundations of relational algebra in number algebra.

Number Algebra

Algebra is "the part of mathematics in which letters and other general symbols are used to represent numbers and quantities in formulae and equations" as defined by google. One algebraic property is operation, meaning there are actions that work to change numbers. For numbers these are addition, subtraction, multiplication, division, raising to a power, and taking of a root. There are many other algebraic properties as well such as the associative property which states that you can group numbers in any way without changing the result of the operations. These properties give algebra its unique power, in that with these properties we can use letters to represent numbers, manipulate them via the defined operators in equations, and know the transformation will be absolutely correct regardless of what actual number is eventually substituted for the letters.

Closure is another of the algebraic properties which states that if an operator which works on numbers always produces numbers then numbers are closed under that operation. Real numbers we know are closed under addition for example. Add any two real numbers and the result will always be a real number. Real numbers are not closed under the square root operation as the square root of -1 is not a real number. All of this comes straight out of the property glossary of the Math Forum @ Drexel.

Relational Algebra

Chris Date says that Ted Codd was a genius. Codd's unique genius was the realization that mathematical relations, combined with set theory and logic, would be very practical when applied to data management. Fabian Pascal's Practical Database Foundation Series provides an excellent explanation of why this is so, and I'll briefly review a few of his insights.

One of the primary database problem of the 1960's was one where a program had to be written for each "question" you would want to ask of the DBMS managing the data. Codd showed that by taking the concept of a relation and adapting it for databases a logical model of data input and output could be developed completely distinct from how data is physically stored. Part of that logical model was the set of operators that would apply to relations - the relational algebra. Just like number algebra allows us to exploit the algebraic properties to manipulate massively complex equations where letters stand for any potential number and guarantee correct results, the relational algebra can do the same for relations.

Why is this important? Because with algebra, operations and their results do not depend of what the relations and their attributes actually mean. Then, by adapting logic, which like algebra proscribes valid forms which can be used to evaluate an argument to be true or false regardless of what the propositions mean, a generalized query language can be implemented that enables a user of the DBMS to ask a true question of the data - one of arbitrary complexity - and the DBMS can be programmed using the rules of relational algebra to give a provably correct answer!

Now just like with number algebra, closure is the essential property that gives the algebra its power. Relations are closed under the basic operators and as such you can arbitrarily nest these operations at will, in any order, to declare practically any "question" you could ever think of. Once the DBMS is created that implements the relational algebra, unique programs no longer have to be written to find the answer to each question. The DBMS user simply declares the question using the relational algebra.

Outer Join

Thus far I have been saying basic operators. That is because the theory does not account for missing data. Each relation by definition contains tuples with a value for every attribute. But to adapt the mathematics to the practical realities of missing data some solution has to be developed for when an attribute value is unknown. SQL implemented the NULL and 3VL. Codd later proposed marks and 4VL. The join operator had to be extended to preserve un-matched tuples, and SQL implemented the extension by producing NULLs instead of values for the attributes of un-matched tuples. Date shows that by doing this outer join is:

a kind of shotgun marriage: It forces tables into a kind of union — yes, I do mean union, not join — even when the tables in question fail to conform to the usual requirements for union (see Chapter 6). It does this, in effect, by padding one or both of the tables with nulls before doing the union, thereby making them conform to those usual requirements after all. But there’s no reason why that padding shouldn’t be done with proper values instead of nulls. (Date, C. J. (2011-12-16). SQL and Relational Theory: How to Write Accurate SQL Code (Kindle Locations 2601-2602). O'Reilly Media. Kindle Edition.)

Conclusion

So are relations closed under the operation of outer join (full, left, or right)? I guess it depends on how that operator is implemented. I would say yes if you do consider a relation to still be a relation even with markers for the missing values of the attributes of un-matched tuples. Or yes if you don't consider a relation to still be a relation even with markers for missing values but can substitute default values for the missing attributes values. But if you don't consider a relation to still be a relation even with markers for missing values, and your DBMS implements SQL NULLs, and you can't or don't substitute values for the SQL supplied NULLs in the outer join, then no, relations are not closed under that outer join. Two relations went in, but a relation did not come out.