ETL vs ELT – Understanding the Difference Between Tables and Views

business-intelligencedata-warehouseetl

TL;DR; Is the difference between ETL and ELT just whether or not the transformed data is a concretion vs abstraction before being loaded into the warehouse?

So forgive me the broadness of the subject title, as it glosses over details, but I'm trying to understand how ETL and ELT differ in a definitive way. They seem very similar, to the point that you could describe either as ELTL. In both cases you start with raw data and end up with concrete transformed data in tables. It seems to be what happens in between is the difference, which can take many forms depending on the number/complexity of transformations from start to end state.

From what I know of ETL, the steps are as follows:

  1. Extract the data from source systems and load into raw tables
  2. Transform the data from raw tables into staging tables
  3. Load the data from the staging tables into your warehouse

In some definitions of ELT I've seen, the steps seem to be:

  1. Extract the data from source systems and load into raw tables
  2. Transform the data notionally from the raw tables by creating views that represent the staging tables
  3. Load the data from the staging views into your warehouse

But then I look at other articles about ELT, and it seems to be:

  1. Extract the data from source systems and load into raw tables
  2. Transform the data notionally from the raw tables by creating views that represent the staging tables
  3. Congratulations, no more hard duplication of data from raw to transformed state, everything is views! Users just fire off their query and views nested n-levels deep instantly* return data!

*Results may vary, depending on whether you have millions of dollars to throw at cloud processing and/or large-scale quantum computing becomes available

Best Answer

No.

What you describe are all variants of ELT.

The difference between ETL and ELT is in where you do the "T". The "traditional" ETL flow would implement the "T" (data transformation) outside the DBMS, using a specialized tool like DataStage, Informatica, Talend, etc. The data transformed to the target model would then be simply loaded into the destination DBMS tables.

The ELT flow differs in that it is the destination database engine that performs the "T"; you load raw data into staging tables in the target database, then use the DBMS tools to "massage" them into the final form and copy to the destination tables. It is an appropriate alternative when the source and target data type impedance is relatively low and the DBMS in question has the tools and capacity to do the job.