I have the following string in db:
- A/B/C/
- A/B/D/
- E/F/C/
It's the uri column.
The table is the following:
create table meta_info (
id bigserial primary key,
uri varchar,
created timestamp with time zone,
version varchar,
binary_length bigserial
)
In most cases I find the directory and want to find children.
E.g.,
SELECT tt.stem[2+1] as current, tt.stem[2+2]
as child from (SELECT regexp_split_to_array(uri, '/')
as stem FROM meta_info
where uri like 'A/B%' )
as tt group by tt.stem[2+1], tt.stem[2+2]
… , where "2" is the size of path to find (A/B% – size 2 , A/B/C% – size 3 , A/B/D/E% – size 4)
E.g.,
SELECT tt.stem[4+1] as current, tt.stem[4+2]
as child from (SELECT regexp_split_to_array(uri, '/')
as stem FROM meta_info
where uri like 'A/B/C/D%' )
as tt group by tt.stem[4+1], tt.stem[4+2]
At the moment I have 400000 rows. It takes too much time to run that sql query.
Can I create an index to improve the performance?
The following index didn't help.
CREATE INDEX idx ON meta_info (regexp_split_to_array(uri, '/'));
The query still is quite slow.
Especially is slow the following sql:
EXPLAIN ANALYZE
SELECT DISTINCT((regexp_split_to_array(uri, '/'))[1]) as branch
FROM meta_info
I need to run the mentioned sql when I need to get the unique first levels (folders).
……………………..
what about text_pattern_ops, yep, I know it. I am already using it. But thanks for split_part(uri, '/',1). Wow, it works really faster. I have replaced all regexp_split_to_array with split_part. Thanks a lot!
Best Answer
If you will be constantly parsing the data out on the fly, you should probably just stored it parsed in the first place.
If you do want to store it as flat strings there are some things you can do to speed it up.
This part can be sped up creating an index
How much of a speed up this is depends on how selective 'A/B/C/D%' is.
This can be sped up 2 to 3 times if you avoid the unnecessary regexp machinery and the array machinery.