Postgresql – Index for split string. PostgreSQL 10

arrayindexpostgresqlstring-splitting

I have the following string in db:

  • A/B/C/
  • A/B/D/
  • E/F/C/

It's the uri column.

The table is the following:

create table meta_info (
    id bigserial primary key,
    uri varchar,
    created timestamp with time zone,
    version varchar,
    binary_length bigserial
    )

In most cases I find the directory and want to find children.

E.g.,

SELECT tt.stem[2+1] as current, tt.stem[2+2]
as child from (SELECT regexp_split_to_array(uri, '/')
                      as stem FROM meta_info
                      where uri like 'A/B%' )
as tt group by tt.stem[2+1], tt.stem[2+2]

… , where "2" is the size of path to find (A/B% – size 2 , A/B/C% – size 3 , A/B/D/E% – size 4)

E.g.,

SELECT tt.stem[4+1] as current, tt.stem[4+2]
as child from (SELECT regexp_split_to_array(uri, '/')
                      as stem FROM meta_info
                      where uri like 'A/B/C/D%' )
as tt group by tt.stem[4+1], tt.stem[4+2]

At the moment I have 400000 rows. It takes too much time to run that sql query.
Can I create an index to improve the performance?

The following index didn't help.

CREATE INDEX idx ON meta_info (regexp_split_to_array(uri, '/'));

The query still is quite slow.
Especially is slow the following sql:

EXPLAIN ANALYZE
    SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch
    FROM meta_info

I need to run the mentioned sql when I need to get the unique first levels (folders).

……………………..
what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). Wow, it works really faster. I have replaced all regexp_split_to_array with split_part. Thanks a lot!

Best Answer

If you will be constantly parsing the data out on the fly, you should probably just stored it parsed in the first place.

If you do want to store it as flat strings there are some things you can do to speed it up.

where uri like 'A/B/C/D%'

This part can be sped up creating an index

create index on meta_info (uri text_pattern_ops);

How much of a speed up this is depends on how selective 'A/B/C/D%' is.

EXPLAIN ANALYZE SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch FROM meta_info

This can be sped up 2 to 3 times if you avoid the unnecessary regexp machinery and the array machinery.

EXPLAIN (ANALYZE, BUFFERS)
SELECT DISTINCT split_part(uri, '/',1) as branch
FROM meta_info;