Learn and use an ETL tool
You'll save yourself a lot of time and pain if you use ETL tools like Pentaho Kettle or Talend Studio to automate the data transfer and merging. They'll happily connect to the Pg instance and the Oracle instance and do the required copying and merging using quite flexible strategies. They can consume CSV dumps too.
Use CSV dumps
If you need to use dumps, try to dump individual tables using COPY ... FORMAT CSV
rather than attempting to wrangle pg_dump
into doing Oracle-compatible partial dumps. This will save you a great deal of time when importing them into other DBMses, since pretty much everything understands CSV. Keep the schemas consistent for each instance so you don't need to handle different schemas for different instances.
Use disjoint set or composite keys
Where unique keys are used, make sure you assign disjoint key ranges to different instances (eg the key range is instance_id * 10^7
to (instance_id + 1 * 10^7)-1
) or use composite keys of (instance_id, generated_key)
. If you use disjoint ranges add a CHECK
constraint to enforce it; similarly, if using composite keys add a CHECK
constraint to force the instance_id
part to a constant.
Lowest common denominator data types
Avoid using PostgreSQL arrays, record types, and hstore
. They'll all dump as string literals in CSV
dumps and you'll have to parse them in order to do something useful with them. For example:
regress=> SELECT ROW( ARRAY[1,2,3], ARRAY['a','b','withquotehere:''inline', 'doublequotehere:"'], ARRAY[ ROW(1,'a'), ROW(2,'b') ] );
row
--------------------------------------------------------------------------------------------
("{1,2,3}","{a,b,withquotehere:'inline,""doublequotehere:\\""""}","{""(1,a)"",""(2,b)""}")
(1 row)
You really, really don't want to have to parse that. It's a pretty extreme case, an anonymous record value containing three different arrays, one an array of integers, one an array of strings including quotes, and one an array of anonymous two-field row tuples. But seriously ... use plain relational modelling and CSV export, the convenience of composite types, hstore, and arrays aren't worth the compatibility pain if you know you'll be doing lots of cross-DB work.
It's actually possible to use them in your schema, you'll just need to COPY FROM
a set of queries that expands them into appropriate relational structures. Since you're going to have to do that, you might as well use regular relational modelling in the first place and save yourself some pain.
Enums are OK; you can just treat them as strings when loading them. They're just a convenient and performant shortcut for a side table and a FK constraint anyway, allowing you to avoid joins and sorts.
In modern PostgreSQL, bytea
is output as hex strings. In older versions, it's output as octal escape sequences. Either way, you can load it into a varchar
column in Oracle then convert it if you can't get Oracle to natively understand the representation. I haven't dealt with Oracle binary types, so I can't help much more there.
It looks like what you want to do is actually have a summary of the sum of used bytes in each folder, right? I believe that triggers in Oracle are not the best choice of technology to provide you this functionality. This could be better modeled using a view, or even better, a materialized view.
The problem with triggers is that they will serialize the DML to the table and will not scale too well. Imagine a situation where you are updating and inserting lots of rows... for each one of them you'll have an update to the folder table.
If you model your problem with a view instead you would not incur the overhead of the extra DML on the folder table. A simple query like this should do the trick:
SELECT fd.name AS folder_name,
SUM(f.bytes) AS folder_size
FROM folders fd, files f
WHERE fd.id = f.folderId
GROUP BY fd.name;
You may then make this query persistent by creating a view (the sum will run every time you query the view) or a materialized view with fast refresh on commit (in this case the materialized view stores the calculation as a pre-computed table, useful if the table is huge and the sum is costly).
The materialized view "fast refresh on commit" feature delegates to Oracle the task of recalculating the agregation every time the underlying table(s) changes.
The Data Warehousing Guide has an in dept explanation on this feature:
https://docs.oracle.com/database/121/DWHSG/refresh.htm#DWHSG015
Best Answer
The following steps will need some tweaking, as you are working with multiple machines - your client machine, a file server (Windows?), a database server etc. Thus, the paths and the file locations are just examples (which have been tested, though). The steps A-B-C are performed only once.
Prerequisites / setup:
[A] Operating system: create a folder for storing the dir listings (in form of text files, see footnote 1) .
[B] Oracle: you need the permissions CREATE TABLE and CREATE ANY DIRECTORY (ie read/write)
[C] In Oracle, create a directory and an external table.
Get the file list, and use the external table:
[1] Open a command line window. Navigate to the network drive (X:). Execute the following command:
Load the dir_contents.txt file into a text editor, make sure that it contains what you need. If - as you wrote - Oracle runs on a different machine, you may have to upload it (via sftp).
[2] Use the external table (see footnote2).
footnote1: This must be accessible when you are working with Oracle. If you collect the datafile(s) on your client machine first, you must "upload" the files (eg via sftp) into the directory that you have specified when executing CREATE DIRECTORY ...
footnote2: The sample file contains the output of
(notice the paths and the command redirection). If you are working with Linux, use "find / -type f > dir_contents.txt"