PostgreSQL Select value row depending on another value row

postgresql

How to select only the value of the column phase that corresponds to the value of the column type?, that way I want to show only the TAXI_SMALL values if the value of type is small_airport, or TAXI_LARGE If the value of type is large_airport as in a sample of my table:

ident	type	aircraft	phase	co2_lto_kg
LFPG	large_airport	AC300	APPROACH	2033.64
LFPG	large_airport	AC300	CLIMB OUT	3279.83
LFPG	large_airport	AC300	TAKE OFF	1281.1932
LFPG	large_airport	AC300	TAXI_LARGE	3675.672
LFPG	large_airport	AC300	TAXI_MEDIUM	2261.952
LFPG	large_airport	AC300	TAXI_SMALL	1413.72

The desired result:

ident	type	aircraft	phase	co2_lto_kg
LFPG	large_airport	AC300	APPROACH	2033.64
LFPG	large_airport	AC300	CLIMB OUT	3279.83
LFPG	large_airport	AC300	TAKE OFF	1281.1932
LFPG	large_airport	AC300	TAXI_LARGE	3675.672

This is the code I've so far and I know that the CASE instance is not at all right…

select 
ident, type, aircraft, phase,
avg(co2_by_phase_lto) as co2_lto_kg,
case(
    WHEN t.type ilike 'large'||'%' THEN phase('TAXI_LARGE')
)
from aviation.emissionsco2 as t
where t.aircraft ilike '%'||'300' and
t.rep_airp ilike '%'||'lfpg'
group by ident,phase

Best Answer

You can use NOT LIKE to include all the non taxi related data -- check that the phase doesn't begin with TAXI_. Then add for each size a comparison that couples the airport size to the taxi route with OR.

SELECT *
       FROM aviation.emissionsco2
       WHERE phase NOT LIKE 'TAXI$_%' ESCAPE '$'
              OR (phase,
                  type) IN (('TAXI_LARGE',
                             'large_airport'),
                            ('TAXI_MEDIUM',
                             'medium_airport'),
                            ('TAXI_SMALL',
                             'small_airport'));

Related Solutions

Postgresql – How to create Postgres DB with case insensitive collation

This may be too late for the original poster, but for completeness, the way to achieve case insensitive behaviour from PostgreSQL is to set a non-deterministic collation. This is only for Postgres 12.

Details are described in docs here. Reproducing relevant portion for completeness:

A collation is either deterministic or nondeterministic. A deterministic collation uses deterministic comparisons, which means that it considers strings to be equal only if they consist of the same byte sequence. Nondeterministic comparison may determine strings to be equal even if they consist of different bytes. Typical situations include case-insensitive comparison, accent-insensitive comparison, as well as comparison of strings in different Unicode normal forms. It is up to the collation provider to actually implement such insensitive comparisons; the deterministic flag only determines whether ties are to be broken using bytewise comparison. See also Unicode Technical Standard 10 for more information on the terminology.

To create a nondeterministic collation, specify the property deterministic = false to CREATE COLLATION, for example:
CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
This example would use the standard Unicode collation in a nondeterministic way. In particular, this would allow strings in different normal forms to be compared correctly. More interesting examples make use of the ICU customization facilities explained above. For example:
CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
All standard and predefined collations are deterministic, all user-defined collations are deterministic by default. While nondeterministic collations give a more “correct” behavior, especially when considering the full power of Unicode and its many special cases, they also have some drawbacks. Foremost, their use leads to a performance penalty. Also, certain operations are not possible with nondeterministic collations, such as pattern matching operations. Therefore, they should be used only in cases where they are specifically wanted.

PostgreSQL Performance – Filter/Aggregate Performance Fluctuations

You have a lot of indexes and my first impuls would be to check whether all of them are actually used and remove the unused ones.

The optimal index for your query is not there, yet, though: a multicolumn index on (scat_group_code, revenue). Since this is a materialized view, it's reasonable to assume a read-only situation, which is perfect for index-only scans.

Also, since the table (many columns, some potentially big text columns) is much wider than the index (two columns, fewer rows for the partial variant), an index-only scan is particularly efficient here. Size matters.

If this particular query is very common / important, I would go one step further and make it a partial, multicolumn index:

CREATE INDEX ON v_premises_filter (scat_group_code, revenue)
WHERE area IS NOT NULL
AND   revenue IS NOT NULL;

I added the 2nd condition revenue IS NOT NULL because min() and max() ignore NULL values. I tested with Postgres 9.6. You may have to add the logically redundant predicate AND revenue IS NOT NULL to your query in older versions to make the query planner understand the index is applicable.

The index your query used in your tests (v_premises_filter_revenue_idx) is on (revenue), which is pretty inefficient for a predicate on scat_group_code. Adding that column as first index item makes a big difference, because Postgres can cheaply grab only the share with matching scat_group_code now. Where your 2nd query had to filter 1631527 rows, (Rows Removed by Filter: 1631527") it only has to filter much fewer rows with area is null OR revenue is null for the simple index and none at all for the partial index.

Also, since irrelevant rows are excluded from the index, it is substantially smaller, which contribute3s to performance as well.

You should see two index-only scans and a much faster query in either case now.

Why the fluctuation?

Your table is big (1909175 rows) and data distribution is obviously irregular. The default statistics_target setting of 100 is probably too small to gather enough details. Postgres decides to use a bad query plan for your 2nd query, based on insufficient information. It might help to increase the STATISTICS setting for scat_group_code, revenue and area.
But you won't need this with the suggested new index any more, because an index-only scan on the new index is always the best plan.

Consider this related answer from just yesterday:

Inconsistent statistics on jsonb column with btree index

Best Answer

Related Solutions

Postgresql – How to create Postgres DB with case insensitive collation

PostgreSQL Performance – Filter/Aggregate Performance Fluctuations

Why the fluctuation?

Related Question