Selective data extract at Schema level

datapumporacle

I am attempting to achieve the following within a single data pump export:

SCHEMA1 = Structures only
SCHEMA2 = Structures only
SCHEMA3 = Structures and data

The answer in this question only works on the table level:

Export Data Pump all data from few schemas except one table structure only?

It suggests removing data from a specific table using a filter:

SCHEMA2.table1: WHERE 1=2

I need the same but applied to all data in the schemas other than SCHEMA3.

How can this be done?

Best Answer

Filtering options in Data Pump are still something that should be improved. Anyway, you could try the below parameters:

directory=data_pump_dir
dumpfile=dump.dmp
logfile=export.log
schemas=schema1,schema2,schema3
exclude=TABLE_DATA:" IN (select table_name from dba_tables where owner in ('SCHEMA1', 'SCHEMA2') union select partition_name from dba_tab_partitions where table_owner in ('SCHEMA1', 'SCHEMA2'))"

This excludes TABLE_DATA for all tables and partitions whose name match the result of the query provided in the exclude part.

And that is why this may not be appropriate for you. If you have a table called T1 both in SCHEMA1 and SCHEMA3, the data of SCHEMA3.T1 will be excluded, because the above query returns T1 from SCHEMA1, and Data Pump will ignore TABLE_DATA of all table called T1. If you have a partitioned table in SCHEMA2, with a partition called P1, the data of the table SCHEMA3.P1 will be excluded. The above may work if your table and partition names are distinct accross all referenced schemas.

Or you could dynamically generate the expdp command or use DBMS_DATAPUMP to filter out all the table data from SCHEMA3.

But I would rather just run 2 seperate exports for SCHEMA1,SCHEMA2 with CONTENT=METADATA_ONLY and a default export (CONTENT=ALL) for SCHEMA3.

Related Solutions

Large data movement

Parallel INSERT APPEND with NOLOGGING would be the way to do this, then as with all NOLOGGING operations, take a backup immediately on completing. Mark indexes unusable first, disable constraints, alter table, perform the operation, then re-enable constraints etc.

Append causes Oracle to always grab free space above the current high water mark, so it's not efficient at reusing space in the segment, but it avoids fiddling with the freelist and the UNDO overhead. If you have to start again for any reason, TRUNCATE, don't DELETE.

As to the incremental commit, it will depend on how your data is segmented, can you easily say move a month's worth at a time (e.g. is the partitioning scheme the same in source and target)? Because remember that if you need to satisfy some predicate, that will obviously slow you down. Test to make sure the operation isn't going to fail logically (e.g. incompatible datatypes in source and target) then allocate sufficient resources and just go for it in one transaction. Good luck!

Export table without BLOB column

You could do that in Oracle 11gR2 with expdp and the REMAP_DATA option.

Create a function in a package that takes a blob as argument, and returns null (or an empty blob perhaps).

Call expdp as usual, adding:

REMAP_DATA=SCHEMA.TAB.BLOB_COLUM:SCHEMA.PACKAGE.YOUR_FUNCTION

Short example (schema: mat):

create or replace 
package remap as 
  function null_lob(col blob) return blob;
end remap;
/

create or replace
package body remap as
  function null_lob(col blob)
    return blob is
    eblob blob;
  begin
    return null;
  end null_lob;
end remap;
/

create table foo(a number, b blob);
-- insert a line with a 10M blob, commit

select a, length(b) from foo;

     A  LENGTH(B)
---------- ----------
     1   10485760

Plain export:

$ expdp mat/*** DIRECTORY=EXP_DIR TABLES=MAT.FOO 
...
Total estimation using BLOCKS method: 11.06 MB
...
. . exported "MAT"."FOO"                                 10.00 MB       1 rows
...

Dump file is ~11M.

With the data remap:

$ expdp mat/*** DIRECTORY=EXP_DIR TABLES=MAT.FOO \
                REMAP_DATA=MAT.FOO.B:MAT.remap.null_lob
...
Total estimation using BLOCKS method: 11.06 MB
...
. . exported "MAT"."FOO"                                 5.398 KB       1 rows
...

Dump file is ~100k, importing that dump does indeed import a null in the column.

Another option would be to use an external table to do the load/unload, with the ORACLE_DATAPUMP driver (I think this was introduced in 10g, see external table concepts).

For the unload, use something like:

create table foo_unload
  organization external (
    type oracle_datapump
    default directory exp_dir
    location ('foo.dmp')
  )
as select a, empty_blob() b from foo;

(You can drop the external table right after creating it, that doesn't - by default anyway - delete the dump file.)

Then move the foo.dmp file to your target server & directory, and create the external table there:

create table foo_load (a number, b blob)
  organization external
    type oracle_datapump
    default directory exp_dir
    location ('foo.dmp')
  );

And finally load your target table (CTAS or truncate ...; insert /*+ append */ ... for instance):

create table foo as select * from foo_load;

Drop the external loader table once that's done.

Best Answer

Related Solutions

Large data movement

Export table without BLOB column

Related Question