Postgresql – Postgres: SELECTing bytea data partially with offset and length

byteapostgresql

I have a table containing a bytea column, with an average entry size of 5MB. Is it possible to query from random points in the bytea data (given an offset and length of data required) instead of SELECTing the whole byte string into the memory?

Best Answer

Of course, just use the substring function:

select substring(bytea_contents from offset for length) FROM table;

or get_byte(bytea_contents,offset) to get a single byte as an integer.

See Binary String Functions and Operators in the doc for all functions.

Related Solutions

PostgreSQL – How to Insert File Data into a Bytea Column

as superuser:

create or replace function bytea_import(p_path text, p_result out bytea) 
                   language plpgsql as $$
declare
  l_oid oid;
begin
  select lo_import(p_path) into l_oid;
  select lo_get(l_oid) INTO p_result;
  perform lo_unlink(l_oid);
end;$$;

lo_get was introduced in 9.4 so for older versions you would need:

create or replace function bytea_import(p_path text, p_result out bytea) 
                   language plpgsql as $$
declare
  l_oid oid;
  r record;
begin
  p_result := '';
  select lo_import(p_path) into l_oid;
  for r in ( select data 
             from pg_largeobject 
             where loid = l_oid 
             order by pageno ) loop
    p_result = p_result || r.data;
  end loop;
  perform lo_unlink(l_oid);
end;$$;

then:

insert into my_table(bytea_data) select bytea_import('/my/file.name');

PostgreSQL Performance – Optimize for Many INSERTS and bytea Updates

For insert performance, see speeding up insert performance in PostgreSQL and bulk insert in PostgreSQL.

~~You're wasting your time with JDBC batching for insert. PgJDBC doesn't do anything useful with insert batches, it just runs each statement.~~ <-- This is no longer true in newer PgJDBC versions, which can now batch prepared statements to reduce round-trip times considerably. But it's still better to:

Use COPY instead; see PgJDBC batch copy and the CopyManager. As for number of concurrent loaders: Aim for a couple per disk, if the operations are disk I/O bound. Eight is probably the most you'll want.

For your "production mode" I suggest loading a sample of data, setting up the queries you expect to run, and using explain analyze to investigate performance. For testing purposes only, use the enable_ params to explore different plan selections. Set the query planner cost parameters (random_page_cost, seq_page_cost, effective_cache_size, etc) appropriately for your system, and make sure shared_buffers is set appropriately. Continue to monitor as you add a simulated production workload, using the auto_explain module, log_min_duration_statement setting, the pg_stat_statements extension, etc.

For details, see the PostgreSQL user manual. I suggest popping back here when you have a more concrete problem with explain analyze query execution details, etc.

Best Answer

Related Solutions

PostgreSQL – How to Insert File Data into a Bytea Column

PostgreSQL Performance – Optimize for Many INSERTS and bytea Updates

Related Question