PostgreSQL Random Bytea – How to Generate a Random Bytea

byteapostgresqlrandomtesting

I would like to be able to generate random bytea fields of arbitrary length (<1Gb) for populating test data.

What is the best way of doing this?

Best Answer

Enhancing Jack Douglas's answer to avoid the need for PL/PgSQL looping and bytea concatenation, you can use:

CREATE OR REPLACE FUNCTION random_bytea(bytea_length integer)
RETURNS bytea AS $body$
    SELECT decode(string_agg(lpad(to_hex(width_bucket(random(), 0, 1, 256)-1),2,'0') ,''), 'hex')
    FROM generate_series(1, $1);
$body$
LANGUAGE 'sql'
VOLATILE
SET search_path = 'pg_catalog';

It's a simple SQL function that's cheaper to call than PL/PgSQL.

The difference in performance due to the changed aggregation method is immense for larger bytea values. Though the original function is actually up to 3x faster for sizes < 50 bytes, this one scales much better for larger values.

Or use a C extension function:

I've implemented a random bytea generator as a simple C extension function. It's in my scrapcode repository on GitHub. See the README there.

It nukes the performance of the above SQL version:

regress=# \a
regress=# \o /dev/null
regress=# \timing on
regress=# select random_bytea(2000000);
Time: 895.972 ms
regress=# drop function random_bytea(integer);
regress=# create extension random_bytea;
regress=# select random_bytea(2000000);
Time: 24.126 ms

Related Solutions

PostgreSQL – How to Insert File Data into a Bytea Column

as superuser:

create or replace function bytea_import(p_path text, p_result out bytea) 
                   language plpgsql as $$
declare
  l_oid oid;
begin
  select lo_import(p_path) into l_oid;
  select lo_get(l_oid) INTO p_result;
  perform lo_unlink(l_oid);
end;$$;

lo_get was introduced in 9.4 so for older versions you would need:

create or replace function bytea_import(p_path text, p_result out bytea) 
                   language plpgsql as $$
declare
  l_oid oid;
  r record;
begin
  p_result := '';
  select lo_import(p_path) into l_oid;
  for r in ( select data 
             from pg_largeobject 
             where loid = l_oid 
             order by pageno ) loop
    p_result = p_result || r.data;
  end loop;
  perform lo_unlink(l_oid);
end;$$;

then:

insert into my_table(bytea_data) select bytea_import('/my/file.name');

Mysql – Perform group by with random group expression (randomly partitioning rows) MYSQL

Your post piqued my curiosity and I look at the RAND() function - which you can't perform GROUP BYs with. I found this and if your table has a PRIMARY KEY, there is a MySQL pseudo-column (_rowid), much like is found in Oracle, Firebird (and maybe others) then you can issue queries such as this one (see structure and data for comments table (sample) below).

select MOD(_rowid, 3) AS my_field, COUNT(ticket_id) 
FROM comments 
GROUP BY my_field;

+----------+------------------+
| my_field | count(ticket_id) |
+----------+------------------+
|        0 |                6 |
|        1 |                7 |
|        2 |                7 |
+----------+------------------+

Now, if your PK is some sort of AUTO INCREMENT and "random" with respect to your data, you could perhaps use the _rowid pseudo-column (with MOD) to generate a sufficient degree of randomness for your requirements, and perform your aggregate queries?

---- structure and data in comments table---

CREATE TABLE `comments` 
(
  `comment_id` int(11) NOT NULL,
  `ticket_id` int(11) NOT NULL,
  PRIMARY KEY (`comment_id`)
);


mysql> SELECT * FROM comments;
+------------+-----------+
| comment_id | ticket_id |
+------------+-----------+
|          1 |         1 |
|          2 |         1 |
|          3 |         1 |
|          4 |         1 |
|          5 |         2 |
|          6 |         2 |
|          7 |         2 |
|          8 |         2 |
|          9 |         3 |
|         10 |         3 |
|         11 |         3 |
|         12 |         3 |
|         13 |         4 |
|         14 |         4 |
|         15 |         4 |
|         16 |         4 |
|         17 |         5 |
|         18 |         5 |
|         19 |         5 |
|         20 |         5 |
+------------+-----------+

Best Answer

Related Solutions

PostgreSQL – How to Insert File Data into a Bytea Column

Mysql – Perform group by with random group expression (randomly partitioning rows) MYSQL

Related Question