What Does Materialize Mean in Database Terminology?

terminology

While learning about "data layout", I came across the term "materialize", which does not have a clear definition and explanation. What is materialize and what does materialize do?

This question is not related to materialized views, and is not specific to any particular DBMS.

Context

The source that I am learning from: Youtube Video from Prof. Dr. Jens Dittrich
: Mapping Relations to Devices
.

At timepoint 9:24 we have this term materialize. He said that is a process from physical page to storage devices.

And an additional question at this time point: Since physical pages should also belong to main memory, if we have a main-memory database, so the data layout only exists in main memory, will there be a materialize process?

Best Answer

My video is about the different mappings that have to be made in order to map relations all the way down to hardware. In practice, these different mappings and linearization steps are often confused and mixed up. This is unfortunate as the different, often hard-coded, decisions taken for certain mapping steps then may hinder query performance later on. Sometimes a simple change in one of these mappings may lead to a completely new product line (e.g. "column stores", PAX/Parquet).

In database research the term "materialization" denotes any form of data storage, i.e. any operation that actually sets some bytes on any storage layer eventually. Examples include a deep copy, memory allocation (not to be confused with malloc()), replication, materialized views (rather than dynamic views), intra-pipeline materializations, but also any form of (partial) copies along the storage hierarchy.

In the video, I introduce (and simplify) the different mapping steps. A simplified view of the world in a database is that everything gets eventually stored to physical pages. physical pages is a fixed term in database research. But make sure you understand that it is merely an abstraction. It is a storage unit in a DBMS. We can safely ignore what happens with those physical pages (for the moment) when discussing certain concepts (like query processing). That is what I do at 9:26 in the video as this is not a course on hardware: I say the data from physical pages gets materialized to storage devices. Again: the latter is a much longer story, e.g. factor in ACID, in particular the "D", recovery, CC, ...

But note that physical pages are not the same as physical memory, rather a physical page is mapped to either a main-memory page (which is almost always a virtual page provided by virtual memory) or mapped to some other device, e.g. pages on a hard disk or SSD. Most devices are virtualized inside as well, e.g. some SSDs used RAID 5 inside.

Of course with virtual memory, snapshotting, and different forms of storage indirection the term is sometimes a bit hard to understand. Sometimes you believe that you materialize, but...

For instance, assume you fork a child process in unix. Looks like the process has physical copies of the data, right? No, it hasn't. Only through copy-on-write you will receive physical copies. So, sometimes the boundary between materialize and not materialize gets blurred.

Hope that helps.