PostgreSQL – Find Substrings Between Two String Fragments

pattern matchingpostgresqlpostgresql-9.0view

I am trying to populate a view in PostGIS in Postgres 9.0 and in this view I want it to contain a substring based on 2 string positions. See below for my code.

CREATE OR REPLACE VIEW vw_actions AS 
 SELECT ls.f_table_schema, ls.f_table_name, 
 (SELECT substr(ls.attribute_actions_text, 
 strpos(ls.attribute_actions_text, 'name="')+6, 
 strpos(ls.attribute_actions_text, '"/>') - 
 strpos(ls.attribute_actions_text, 'name="'))) AS actions
   FROM layer_styles ls;

The outcome is that it doesn't like minus numbers when using strpos. I can get it go forward 6 characters to remove the 'name="' from the returned substring but cannot remove the '"/>'.

It returns the following:

View SHED Database"/>

where I want it return:

View SHED Database

Any suggestions would be greatly appreciated.

ADDITION:I have found out that if I was using 9.1 I could have used strposrev and i think the following code would have worked:

CREATE OR REPLACE VIEW vw_actions AS 
 SELECT ls.f_table_schema, ls.f_table_name, 
 (SELECT substr(ls.attribute_actions_text::text, 
 strpos(ls.attribute_actions_text::text, 'name="'::text)+6, 
 strposrev(ls.attribute_actions_text::text, '"/>'::text)+3 - 
 strpos(ls.attribute_actions_text::text, 'name="'::text))) AS actions
   FROM layer_styles ls;

Best Answer

Use substring() with a regular expression instead:

substring(ls.attribute_actions_text FROM 'name="(.*?)"/>')

The dot (.) matches any character, *? is the non-greedy quantifier for a sequence of 0 or more matches and the parentheses (()) mark the substring to be returned.

Like your code, this selects the first string matching the pattern and does not look further.

Also, you don't need to make your expression a subquery, that adds nothing but overhead:

CREATE OR REPLACE VIEW vw_actions AS 
SELECT ls.f_table_schema
     , ls.f_table_name
     , substring(ls.attribute_actions_text FROM 'name="(.*?)"/>') AS actions
FROM   layer_styles ls;

Quick test case (you should have provided):

SELECT *, substring(ls.attribute_actions_text FROM 'name="(.*?)"/>')
FROM  (
   VALUES
     ('bar name="View SHED Database"/> foo')
   , ('bar name="View SHED Database"/> fooname="View SHED Database"/>xx')
   , ('name="buntch a bull"/> fooname="View SHED Database"/>xx')
   , ('xxname="bla foo grr"/>')
   , ('')
   , (NULL)
   ) ls(attribute_actions_text)

Related Solutions

PostgreSQL – Concatenating SETOF Type or SETOF Record

The approach you're using is unnecessarily complex - and very inefficient. Instead of the first function use:

create or replace function compute_pair_id_value(id bigint, value integer)
    returns setof pair_id_value
as $$
SELECT $1, generate_series(0,$2);
$$                          
language sql;

or better, get rid of it entirely and write the whole operation like this:

-- Sample data creation:
CREATE TABLE my_obj(id bigint, obj_value integer);
insert into my_obj(id,obj_value) VALUES (1712437,2),(17000,5);

-- and the query:
SELECT id, generate_series(0,obj_value) FROM my_obj;

Resulting in:

regress=> SELECT id, generate_series(0,obj_value) FROM my_obj;
   id    | generate_series 
---------+-----------------
 1712437 |               0
 1712437 |               1
 1712437 |               2
   17000 |               0
   17000 |               1
   17000 |               2
   17000 |               3
   17000 |               4
   17000 |               5
(9 rows)

This exploits PostgreSQL's behaviour with set-returning functions called in the SELECT list. Once PostgreSQL 9.3 comes out it can be replaced with a standards-compliant LATERAL query.

Since it turns out your question was a simplified version of the real problem, let's tackle that. I'll work with the simplified compute_pair_id_value above to avoid the hassle of plpython3. Here's how to do what you want:

SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;

Result:

regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

but again, be warned that compute_pair_id_value will be called more than once. This is a limitation of PostgreSQL's query executor that can be avoided in 9.3 with LATERAL support, but as far as I know you're stuck with it in 9.2 and below. Observe:

create or replace function compute_pair_id_value(id bigint, value integer)
    returns setof pair_id_value
as $$
BEGIN
  RAISE NOTICE 'compute_pair_id_value(%,%)',id,value;
  RETURN QUERY SELECT $1, generate_series(0,$2);
END;
$$             
language plpgsql;

output:

regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(17000,5)
NOTICE:  compute_pair_id_value(17000,5)
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

See how compute_pair_id_value is called once per output column?

There is a workaround: Another layer of subquery to unpack the composite type result. See:

regress=> SELECT (val).* FROM (SELECT compute_pair_id_value(id,obj_value) FROM my_obj) x(val);
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(17000,5)
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

You can use the same technique in your code if you really must LOOP over the results (it's slow to do that, so avoid it if you can).

Postgresql – Syntax error when trying to import database from two PostgreSQL databases

You seem to have omitted the new version of the upgraded server, but I strongly suspect it's still on 8.1 since the ALTER SEQUENCE ... OWNED BY clause was added in 8.2 (Compare ALTER SEQUENCE docs on 8.2 to ALTER SEQUENCE docs on 8.1).

You really need to understand that 8.4 and 8.1 aren't slightly different versions, they're massively different. It's like saying that Windows Vista and Windows 7 are "slightly different" versions because they're Windows versions 6.0 and 6.1 respectively. Or that Mac OS X 10.4 and 10.7 are "slightly different"... when they're more incompatible than not.

Yes, I know the version numbering in PostgreSQL is stupid; it's a historical oddity we seem to be stuck with despite periodic attempts to change it. See the version policy document linked in the prior answer for more information about what the versioning means, and read the release notes for a better understanding.

Best Answer

Related Solutions

PostgreSQL – Concatenating SETOF Type or SETOF Record

Postgresql – Syntax error when trying to import database from two PostgreSQL databases

Related Question