Postgresql – Oracle to PostgreSQL

migrationoraclepostgresql

I'm having som test databases running Oracle 10 & 11 that I plan to try to migrate to PostgreSQL.

Some googling hinted that the perl script "ora2pg" could be of use, and free of charge. However, now I've spent too much time trying to get that running on my winx64 machine, since MinGw won't compile it and activeperl modules hates me.

Are there any other worthy, free, tools to take a look at or should I get a Linuxbox up & running and try ora2pg that way?

Regards

Best Answer

We also found existing tools tricky to work with, so ended up hacking together a solution using the sqlalchemy library in python. Our general approach was to mirror the source database; hackily convert data types where necessary; create the target database; copy across the data in chunks.

For an example, see: https://github.com/MIT-LCP/oracle2postgres/blob/master/migration.ipynb. Using this approach, we migrated a fairly large database (~100 schema, with >1000 tables, some tables with >200 million rows) from Oracle to Postgres.

For more detail, see: https://github.com/MIT-LCP/oracle2postgres. The package can be installed with pip install oracle2postgres. If you hit problems when reusing the code, feel free to raise an issue and we'll do our best to help.

Learn and use an ETL tool

You'll save yourself a lot of time and pain if you use ETL tools like Pentaho Kettle or Talend Studio to automate the data transfer and merging. They'll happily connect to the Pg instance and the Oracle instance and do the required copying and merging using quite flexible strategies. They can consume CSV dumps too.

Use CSV dumps

If you need to use dumps, try to dump individual tables using COPY ... FORMAT CSV rather than attempting to wrangle pg_dump into doing Oracle-compatible partial dumps. This will save you a great deal of time when importing them into other DBMses, since pretty much everything understands CSV. Keep the schemas consistent for each instance so you don't need to handle different schemas for different instances.

Use disjoint set or composite keys

Where unique keys are used, make sure you assign disjoint key ranges to different instances (eg the key range is instance_id * 10^7 to (instance_id + 1 * 10^7)-1) or use composite keys of (instance_id, generated_key). If you use disjoint ranges add a CHECK constraint to enforce it; similarly, if using composite keys add a CHECK constraint to force the instance_id part to a constant.

Lowest common denominator data types

Avoid using PostgreSQL arrays, record types, and hstore. They'll all dump as string literals in CSV dumps and you'll have to parse them in order to do something useful with them. For example:

regress=> SELECT ROW( ARRAY[1,2,3], ARRAY['a','b','withquotehere:''inline', 'doublequotehere:"'], ARRAY[ ROW(1,'a'), ROW(2,'b') ] );
                                            row                                             
--------------------------------------------------------------------------------------------
 ("{1,2,3}","{a,b,withquotehere:'inline,""doublequotehere:\\""""}","{""(1,a)"",""(2,b)""}")
(1 row)

You really, really don't want to have to parse that. It's a pretty extreme case, an anonymous record value containing three different arrays, one an array of integers, one an array of strings including quotes, and one an array of anonymous two-field row tuples. But seriously ... use plain relational modelling and CSV export, the convenience of composite types, hstore, and arrays aren't worth the compatibility pain if you know you'll be doing lots of cross-DB work.

It's actually possible to use them in your schema, you'll just need to COPY FROM a set of queries that expands them into appropriate relational structures. Since you're going to have to do that, you might as well use regular relational modelling in the first place and save yourself some pain.

Enums are OK; you can just treat them as strings when loading them. They're just a convenient and performant shortcut for a side table and a FK constraint anyway, allowing you to avoid joins and sorts.

In modern PostgreSQL, bytea is output as hex strings. In older versions, it's output as octal escape sequences. Either way, you can load it into a varchar column in Oracle then convert it if you can't get Oracle to natively understand the representation. I haven't dealt with Oracle binary types, so I can't help much more there.

Postgresql – Migration From Oracle to PostgreSQL Encoding Errors

Sounds like the data you're exporting isn't utf-8 encoded, or is being chopped up / mangled in transit.

This: Ã9ÃýÃ0Ã looks like what happens when you decode utf-8 as iso-8859-15, cp1252, or related 1-byte encodings. But it's not valid utf-8 when demangled. Perhaps you've cut it part way through a string, rather than copying from the beginning of the value?

0xcb 0xcf is indeed nonsense as utf-8. It's a continuation byte without any appropriate context.

Trying other encodings, it's:

ËÏ in latin-1. Unlikely. Same in iso-8859 3,4,9,10,14,15, and in the default Windows encoding for Western Europe / US, cp1252.
ËĎ in iso-8859-2
ЫЯ in iso-8859-5
ثد in iso-8859-6
ΛΟ in iso-8859-7
หฯ in iso-8859-11
ĖĻ in iso-8859-13

Any of those look like likely candidates, given what you know about the original data?

You need to either export in a known encoding, or determine the encoding of the input reliably.

It's also possible that the data is in fact starting with a utf-8 continuation byte and was supposed to be utf-8, but something has mangled it by cutting utf-8 strings up byte-wise.

If you can post a complete line of exported CSV it might help.

Best Answer

Related Solutions

Postgresql – Moving data between Oracle and PostgreSQL

Learn and use an ETL tool

Use CSV dumps

Use disjoint set or composite keys

Lowest common denominator data types

Postgresql – Migration From Oracle to PostgreSQL Encoding Errors

Related Question