Here is a general way of getting a table with bytea
data from postgres into a similar Oracle table with a blob
.
It should be trivial to adapt my test to your real tables, and you should probably increase the size of the 'chunks' that I used to keep the output readable, from 20 to something more like the 2000 limit for string literals (4000/2 as each byte is 2 bytes hex).
1) postgres
testbed:
begin;
set role dba;
create role stack;
grant stack to dba;
create schema authorization stack;
set role stack;
--
create function random_bytea(p_length in integer)
returns bytea language plpgsql set search_path to 'stack' as $$
declare
o bytea := '';
begin
for i in 1..p_length loop
o := o||decode(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0'), 'hex');
end loop;
return o;
end;$$;
--
create function bytea_to_oracle_sql(p_id in integer, p_data in bytea)
returns text language plpgsql set search_path to 'stack' as $$
declare
o text := 'insert into foo(id, bar) values('||p_id||', empty_blob());';
begin
for i in 0..(length(p_data)-1)/20 loop
o := o||chr(10)||'update foo set bar=append_to_blob(bar,'''
||encode(substr(p_data,i*20+1,20),'hex')||''') where id='||p_id||';';
end loop;
return o;
end;$$;
--
create table foo(id serial, bar bytea);
insert test data and check hashes:
insert into foo(bar) values(random_bytea(90));
insert into foo(bar) values(random_bytea(90));
select id, md5(bar) from foo;
/*
id | md5
----+----------------------------------
1 | 32a24c8827047eff517e8a40bf3ad8b4
2 | 25ac852000889ab782ad6437accf1546
*/
generate Oracle script:
\o '/tmp/oracle.sql'
select bytea_to_oracle_sql(id, bar) from foo;
\o
contents of /tmp/oracle.sql:
insert into foo(id, bar) values(1, empty_blob());
update foo set bar=append_to_blob(bar,'ddbe858f7905551862507ddaa3b5410cba8013d4') where id=1;
update foo set bar=append_to_blob(bar,'089d6e66f16b10ba1204a4efc22a7f3c2fd45492') where id=1;
update foo set bar=append_to_blob(bar,'24d26dc887afd4412fe8163786859e77f0af3202') where id=1;
update foo set bar=append_to_blob(bar,'b3d7f2750172b2314606c36bd8313360e008a20f') where id=1;
update foo set bar=append_to_blob(bar,'f0b847763de6ee2e9521') where id=1;
insert into foo(id, bar) values(2, empty_blob());
update foo set bar=append_to_blob(bar,'3148f823befa95712bdc77ef4750207bb0018352') where id=2;
update foo set bar=append_to_blob(bar,'10730b58ea483ed876d4faa81df3ccdbed624d18') where id=2;
update foo set bar=append_to_blob(bar,'3ec40886142901c52a85173bf92393e36bd2bce2') where id=2;
update foo set bar=append_to_blob(bar,'a6b68ac3a9569f97b8ecaff7b1b87dc6e17f8b0b') where id=2;
update foo set bar=append_to_blob(bar,'04a347fdc6dae132ad9d') where id=2;
remove testbed:
rollback;
2) Oracle
testbed:
create table foo(id integer, bar blob);
--
create or replace
function append_to_blob(p_data in blob, p_append_hex in varchar) return blob is
o blob;
begin
dbms_lob.createtemporary(o, TRUE);
dbms_lob.append(o, p_data);
dbms_lob.writeappend(o, length(p_append_hex)/2, hextoraw(p_append_hex));
return o;
end;
/
run script:
insert into foo(id, bar) values(1, empty_blob());
update foo set bar=append_to_blob(bar,'ddbe858f7905551862507ddaa3b5410cba8013d4') where id=1
...
check hashes:
select id, dbms_crypto.hash(bar,2) from foo;
/*
ID DBMS_CRYPTO.HASH(BAR,2)
--- --------------------------------
1 32A24C8827047EFF517E8A40BF3AD8B4
2 25AC852000889AB782AD6437ACCF1546
*/
credit to Vincent Malgrat for his concat_blob function I adapted. See here for the random bytea function by itself
If the only difference is local vs remote, then the problem has to be your round trip times. The actual internal query execution is not aware of the type of connection used. However I also wonder if you are also running your dev system on a different database. In that case, caching could be very different and could also make a significant difference.
I see a few ways to solve this problem. The first is just to have a dev copy of the database for the site on your dev system. This data could then be sanitized so that if your dev system is compromised or lost (supposing it is a laptop for example) the data exposure is minimal. Then you can test performance on a system which can do caching specific to your dev work and will not have the round trip.
The second approach is to build a much faster tunnel, and to possibly throw a lot more RAM at the db server.
Best Answer
Why would you think so?
bytea
is coerced to eitherhex
(default) orescape
format when assigned to atext
column. Non-ASCII characters are encoded automatically. Should work "smoothly" at all times - except that you don't want to allow it.True, but it is contained within the transaction if you create and drop the cast within: DDL commands are fully transactional in Postgres, so only your session (within that current transaction) will ever get to see the cast.
Solution with custom cast
Currently (all versions incl. Postgres 13) the cast from
bytea
totext
has no explicit entry in the system catalogpg_cast
. It is provided by basic input/output functions of the respective types. This behavior can be overruled with an explicit entry, created withCREATE CAST
.You need to be the owner of the involved types, so this basically means you have to be superuser install it.
Create this casting function once per database:
To allow creating / dropping the special cast to unprivileged roles, add wrapper functions. Do this as superuser (or as dedicated daemon role):
Now you can do what you asked for:
db<>fiddle here -- second half does not execute due to missing privileges.
Extended test case
Test table:
No exception raised:
Also no exception:
Exception raised:
Client side problem?
Your comment seems to reveal a rabbit hole:
The solution cannot work at all once you incorrectly assume data type
text
on the client side. The cast is only invoked if you hand in typedbytea
values. I.e.: use a function or prepared statement with explicit data type or append an explicit cast to string literals handed to anINSERT
command like demonstrated above:'\000'::bytea
.Once you pass untyped literals, Postgres has no way to know that it should really be
bytea
. And how can you (incorrectly) preparebytea
strings fortext
input and then still (correctly?) add an explicit cast tobytea
?