ETL Best Practices – Data Transfer Between OLTP Systems

best practicesdb2etl

First off, I want to post that I am a developer and not a DBA, so I am trying to approach this from the standpoint of what is the best thing to do, rather than just assuming I can do everything efficiently "in code" without any issues.

Ok, now to the challenge. I am going to be updating data in two different DBMS's. The one is DB2 for i on our iSeries systems. The other is DB2 on AIX. I know there are some differences between those versions of DB2. Not sure if I will hit those differences or not. We have some data that could be considered "lookup data" (ie, data that could maybe populate a drop down box, etc. In other words, a user or program looks up the data and uses it, rather than this being "input" data from a business user). This lookup data is maintained by a particular department in our company. We are writing a new system to replace our legacy system. The legacy system resides on System i. The new system resides as a Java EE app on AIX. Until the new system completely replaces the legacy system we want to push this lookup data from its legacy format in our legacy tables, to its newer format in new tables for the new system (this will also help us with data cleansing too).

I am wondering what the best way to go after this data, transform it, and load it? We have purchased an ETL tool (IBM InfoSphere DataStage), but may not use it yet, because we have no one trained on the tool and we have never used an ETL tool before. We have always been a shop with developers who wore multiple "hats". And recently, we are trying to make the transition to having DBA's and doing things the proper way.

So I am responsible for coming up with a design, and since this only is applicable until we move away from our legacy system, the feeling I am getting from management is they want a custom code program(s) to take care of this ETL (at least until we get better at DataStage).

As a developer, I would have normally tried do a read of the data sorted a certain way, and then cycle through all my records one by one and transform them and then either write each record one at a time, or maybe "write off" the updates to a file that can do a mass insert at the end.

Being a developer, I have no idea if this is really the right/smart way to do it? Any thoughts? The data in the table I am extracting from will be between 40 and 60 thousand records, so nothing huge, but that is still a lot to process individually within a program when I know SQL and other DB tools and strategies can better handle things.

Any thoughts or suggestions? I appreciate any made.

I also started a similar topic that I labelled "What are the best practices for ETL of data from one OLTP system to another Part 2?"

Best Answer

Use the tools that you are most comfortable with. Your situation seems simple and lasts only for a short time. If you have to keep this up and running for a long time, you might end up by using more advanced tooling, where you define rules for conversions and synchronizations to be performed for you. Doing so is easier to maintain and to document. The learning curve might be a little steep. On the other hand, you could use this case to train yourself. There is no urgent deadline and when it gets more urgent, you probably build the home cooked version within a day.