Relational Theory – Practical Reasons for Learning Relational Algebra

learningrelational-theory

Learning how to formulate query-like expressions in relational algebra is a traditional part of many, perhaps most, "Introduction to Databases" courses.

This is usually justified by the assertion that relational algebra is the mathematical foundation of relational databases in general and SQL in particular with the implication that it is important to know it.

However it seems to me that formulating expressions in relational algebra is basically the same as formulating queries in SQL and that much the same thought processes underly both tasks. In particular, I can't really see that knowing relational algebra makes it easier to write SQL queries or vice-versa. This makes me wonder if the teaching of relational algebra is just some sort of historical hangover, or if there actually specific benefits to knowing it.

So my question is "are there specific practical benefits to knowing relational algebra, of sufficient importance to make it worthwhile teaching?"

As Database Administrators, do you feel that relational algebra is or was important to your career trajectory?

A sort of sub-question is whether the time spent learning relational algebra could more effectively be used by learning more SQL?

Best Answer

When Codd defined the relational model he defined a set of operators which could be applied to relations. In specifying a relational algebra, much like specification of an integer algebra, we are able to use symbols in place of relations to solve queries. These operators are subject to the same algebraic properties that integer algebra operators (+, -, *, /) are. As a result, we can assume certain laws that always apply to a relation, any relation, undergoing that operation. For example, in integer algebra we know that addition and multiplication are associative in that we can change the grouping of operands and not change the result:

a + ( b + c ) = ( a + b ) + c

Similarly, in relational algebra we know that natural join is associative and thus know that A join B join C can be executed in any order. These properties and laws create the power to re-write query formulations and be guaranteed to get the same results. The book Applied Mathematics for Database Professionals provides significant detail on the various re-write rules you can use to precisely formulate the same query in different ways. In a perfect world any formulation producing the same result would have the same performance. A modern optimizer, while an amazing piece of software, isn't perfect however. Thus if you have formulated a query one way and are getting poor performance, you have the skills to formulate it a different way and know it has the same semantics. Another practical advantage to this is in the specification of database constraints. First, understanding the relational algebra enables you to determine the simplest way to formulate the constraint. Second, by formulating the constraint in formal logic, you can immediately clarify any ambiguity in intent from the business subject matter experts who formulated the business rule in loose English and avoid bugs.

It was Leonardo da Vinci who said:

He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast.

In this same way, a data practitioner who doesn't understand the fundamentals of relational theory cannot be in as complete command of the technology as they can be with that understanding. Some great references on relational algebra are SIRA_PRISE's Introduction to the Relational Algebra page, and CJ Date's SQL and Relational Theory. Date's book shows the practicality in understanding relational algebra so that you can write much more accurate SQL queries. SQL has many quirks and pitfalls and having a sound grasp of how it works vs. the original relational algebra operators really helps realizing where the pitfalls are and avoiding them.