Database Design – Understanding Data Abstraction Levels

database-design

The three-level ANSI SPARC Database Architecture suggests three data abstraction levels, namely, external, conceptual, and internal levels.

If I understand correctly, the external level represents the users' views, the conceptual level is the conceptual diagram (ER model), and the internal level is the relational model.

My problem is that I found in the literature that besides those, there is a fourth level (the lowest one), which is the physical level.

I want to understand what is it concretely? Is it at this point that we define the type of index, access paths, and things related to physical access to data?

Best Answer

Although the definitions of the schemas or levels of representation of the ANSI/SPARC architecture did not reach the desired precision, they are only three, and they are named external, conceptual and internal, respectively.

As far as I know, there is no fourth level, because the physical structures employed by a database management system (DBMS for brevity) compose, in fact, the internal schema, as I will explain below.

Note: Even if imprecise, the ANSI/SPARC architecture is definitely a very useful reference when building a system, so it looks like the literature you found contains inaccurate information or perhaps it makes a reference to the (also very useful) notion where the layers of abstraction of a database are called conceptual, logical and physical, which has different implications.

Brief descriptions of the three ANSI/SPARC levels

I deem opportune to describe each level (ordered from the highest to the lowest degree of abstraction) so as to expose its scope:

  • The external level is the way in which the information is presented to end-users of a computerized information system; e.g., the specific organization of the pieces of information as displayed or requested in the screens or windows (via, e.g, lists, grids, labels, text boxes, check boxes, drop-down menus, etc.) of an application program.

  • The conceptual level involves the representation of the structure of the information as (a) perceived by the business experts and (b) modeled by the database analysts, commonly in terms of entity types, properties and associations, so it is comprised of ideas or conceptualizations, it is not “concrete” yet and its representation requires constructs that are in lower levels of abstraction. The conceptual level is informally referred to as the business rules of a certain organization regarding its informational requirements. As an side, schemas of this kind are usually —though not necessarily— expressed via graphical devices like, e.g., entity-relationship diagrams to facilitate communication between the interested parties.

  • The internal level has to do with how the DBMS of relevance stores the information in a concrete form; e.g., arrays, pages, extents, files, records, fields, pointers, etc. This is the lower level of abstraction since, to put it some way, the DBMS “saves” the information in “real” structures, so this may also be conceived as the physical level of the architecture but naming it in this way does not imply that there is a fourth level. It is worth to mention that there can be different sublevels of abstraction within the internal level itself, but that factor depends on the particular mechanisms used by the DBMS in question.

Relational databases created on SQL DBMSs with respect to the ANSI/SPARC architecture

When a practitioner uses a SQL platform (Microsoft SQL Server, IBM DB2, Sybase ASE, PostgreSQL, etc.) to build a database using the mechanisms proposed by the relational model by Dr. E. F. Codd:

  • The base tables, columns, types (and domains when available) and constraints must reflect the characteristics of the conceptual schema (entity types, properties, associations, cardinalities) so that the data (in the shape of rows) remains consistent with it. The declarations made by means of a data sublanguage (e.g., SQL DDL) of the base tables, columns, types and constraints are instruments that formalize the conceptual schema, so they are considered logical constructs and, as such, they do not yet “tocuch” the physical level, they are still free from DBMS-specific concrete storage structures. Normalization, being a logical procedure, applies here, not at the conceptual schema.

  • The views (i.e., derived tables that “combine” columns from one or more base tables or from other derived tables as well, typically by virtue of SQL DML operations involving JOINs) maintain a close correspondence to the external level, because they gather and provide the information requested or sent by the screens or windows of one or more application programs, just like the end-users see it.

  • The indexes, pages, extents, files1, records2, fields3, partitions, allocation maps, and many other arrangements match the internal schema, since they are the “physical” structures utilized by a DBMS to store the information.

The SQL DBMS functions as a mediator (1) between the internal and the logical representations of the conceptual schemas, and (2) between the internal and external representations through connections via views.

Notes


1, 2, 3 Note how important it is to distinguish between (a) files, records and fields and (b) tables, rows and columns —or relations, attributes and tuples—, as they are elements that belong to distinct levels of abstraction and, hence, are susceptible of very different operations and treatments, having therefore very dissimilar impacts.