Postgresql – Need to group by column while ignoring some text format

group bypostgresql

I have the following data.

Sport
--------------
Baseball
Football
Baseball [a]
Football [a] [b]
Basketball

How can I group this to ignore the data in the brackets?

Best Answer

The expression

regexp_replace(the_column, ' +.*$', '')

will remove everything starting with the first space character up until the end of the string. So you just need to group by that expression:

with test_data (sport) as
(
  values 
    ('Baseball'), 
    ('Football'), 
    ('Baseball [a]'), 
    ('Football [a] [b]'), 
    ('Basketball')
)
select regexp_replace(sport, ' .*$', ''), count(*)
from test_data
group by regexp_replace(sport, ' .*$', '')

This will return:

regexp_replace | count
---------------+------
Baseball       |     2
Basketball     |     1
Football       |     2

SQLFiddle: http://sqlfiddle.com/#!15/d41d8/3379

Edit

To deal with legitimate multi-word sports there are two options:

  1. remove everything after the first [: regexp_replace(sport, ' \[.*$', '')
  2. remove all characters that occur between [..] and then trim the result: trim(regexp_replace(sport, '\[.*\]', '', 'g'))