Extract Transform and Load is the preparation of foreign data to be inserted into your database or data warehouse
Looking at the basics of the ETL, noted Data Warehouse Designer Bill Inmon notes:
Once upon a time in the not so distant past, there was no ETL (extract, transform and load) software. If you wanted to build a data warehouse, you had to write code in order to get data from one source to the appropriate target. There was lots of code – lots of repetitive code.
After you wrote your code, you had to maintain it. Every time a legacy system changed, you did manual maintenance to your code. Every time a target definition changed, you had to do manual maintenance to your code. Every time an end user wanted something new, you had to do maintenance to your code.
From here, a plethora of ETL products proliferated, as Inmon describes in his brief history of ETL products. They were popular because they were software tools designed to extract data from changing systems, transform it according to specific rules, and load it into data warehouses. This software process meant that humans were involved only in the critical element of the loop: untangling loaded data errors. By automating to the maximum possible extent, the ETL process provided companies a seamless way of not only loading their current databases into a data warehouse, but the ability to load future data sets of the same databases in, so that the data warehouse can continuously provide future results.
To answer your question specifically, different databases provide different extracts. Transformations are applied to normalize the data. Normalization is both in the database-specific sense, changing the patterns of the data to match the receiving data warehouse, but also in the human sense, insuring that the same data in different systems appears the same to the incoming system.
Data sources can be anything you can code a transform for, as the purpose of the transform is to apply rules to the incoming data such that it fits your data model. Joining different data sets should only be performed if necessary. Rely on your recipient database instead to synchronize results.
I have found building a relationship table between the old (natural) and new (integer) keys works well in conversion processes such as this. The integer key is built using the standard auto-increment method for the database. Maintain unique keys on the data which forms the natural key in the new tables.
Develop a repeatable data synchronization method that can be applied as often as required. Depending on your needs this can be daily, weekly, monthly, or on-demand. Make the synchronization modular so it can be updated as required. In some development processes you may end up feeding from the new system back to the old system.
As you develop and deploy the system, track which system is the system of record for which data. Move control to the new system as possible. You may want or need processes to identify and deal with changes which occur on the system which is not the system of record.
Manage change overs from the old system to the new. If possible, disable updates in the old system as the system of record is changed to the new system. This will require a feed from the new system to the old.
Best Answer
lots of Discussions about ETL vs ELT out there.
The main difference between ETL vs ELT is where the Processing happens ETL processing of data happens in the ETL tool (usually record-at-a-time and in memory) ELT processing of data happens in the database engine
Data is same and end results of data can be achieved in both methods.
it very much depends on you and your environment If you have a strong Database engine and good hardware and you can do heavy processing on it, ELT is good for you, If you have a busy datawarehouse engine and you need to free it from processing go for ETL.
notice that having an ETL tool gives you both the options, like ETL(T), you can do the Transformation in the ETL tool and you can do transformation in the Database engine as well
but ELT you only have the option of transformation in the database engine, but you should know that Databases are better at set based operations than record-at-a-time ETL tools.
similar question asked on SO but supporting ETL and also a nice Article comparing ETL vs ELT, but favoring ELT