PostgreSQL – Select Arbitrary Element

postgresqlselect

I have a table that has path1, path2, and sha1 value. For any values of path2 and sha1, there can be multiple values of path1. I just want one of those paths. I don't really care which one.

I'm thinking I can do a group by for path2 and sha1. Now I just need to select one of the values of path1. I suppose I could select the minimum value of path1 but that would be doing extra work that isn't really needed.

Google tells me that Microsoft has "FIRST" but I don't see that in the postgres pages. Plus… I'd like to stick with normal SQL if possible.

Best Answer

There are a bunch of ways you can do this, one of them is with DISTINCT ON as @Ypercube has suggested,

SELECT DISTINCT ON (path2, sha1) path2, sha1, path1
FROM table_name
ORDER BY path2, sha1;

You can also use an ordered-set aggregate which should generally be slower.

SELECT percentile_disc(0) WITHIN GROUP (ORDER BY path1) AS path1, path2, sha1
FROM table_name
GROUP BY path2, sha1;

Related Solutions

Sql-server – SELECT multiple sensor values in one query

First things first, I notice that your 'what I do now' query:

SELECT TOP (1)
    ca.SensorValue,
    ca.Date
FROM sys.partitions AS p
CROSS APPLY
(
    SELECT TOP (1)
        v.Date, 
        v.SensorValue
    FROM SensorValues AS v
    WHERE 
        $PARTITION.SensorValues_Date_PF(v.Date) = p.[partition_number]
        AND v.DeviceId = @fDeviceId
        AND v.SensorId = @fSensorId
        AND v.Date <= @fDate
    ORDER BY 
        v.Date DESC
) AS ca
WHERE 
    p.[partition_number] <= $PARTITION.SensorValues_Date_PF(@fDate)
    AND p.[object_id] = OBJECT_ID(N'dbo.SensorValues', N'U')
    AND p.index_id = 1
ORDER BY
    p.[partition_number] DESC, 
    ca.Date DESC;

...produces an execution plan like this:

Original Plan

This execution plan has an estimated total cost of 0.02 units. Over 50% of this estimated cost is the final Sort, running in Top-N mode. Now estimates are just that, but sorts can be expensive in general, so let's remove it without changing the semantics:

SELECT TOP (1)
    ca.SensorId,
    ca.SensorValue,
    ca.Date
FROM
(
    -- Partition numbers
    SELECT DISTINCT
        partition_number = prv.boundary_id
    FROM
        sys.partition_functions AS pf
    JOIN sys.partition_range_values AS prv ON
        prv.function_id = pf.function_id
    WHERE
        pf.name = N'SensorValues_Date_PF'
        AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
) AS p
CROSS APPLY
    (
    SELECT TOP (1)
        v.Date,
        v.SensorValue,
        v.SensorId
    FROM dbo.SensorValues AS v
    WHERE
        $PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
        AND v.DeviceId = @fDeviceId
        AND v.SensorId = @fSensorId
        AND v.Date <= @fDate
    ORDER BY
        v.Date DESC
  ) AS ca
ORDER BY
    p.partition_number DESC,
    ca.Date DESC

Now the execution plan has no blocking operators, and no sorts in particular. The estimated cost of the new query plan below is 0.01 units and the total cost is distributed evenly over the data access methods:

Improved Query Plan

With the improvement in place, all we need to produce a result for each Sensor ID is to make a list of Sensor IDs and APPLY the previous code to each one:

SELECT
    PerSensor.SensorId,
    PerSensor.SensorValue,
    PerSensor.Date
FROM 
(
    -- Sensor ID list
    VALUES 
        (@fSensorId1),
        (@FSensorId2),
        (@FSensorId3)
) AS Sensor (Id)
CROSS APPLY
(
    -- Optimized code applied to each sensor
    SELECT TOP (1)
        ca.SensorId,
        ca.SensorValue,
        ca.Date
    FROM
    (
        -- Partition numbers
        SELECT DISTINCT
            partition_number = prv.boundary_id
        FROM
            sys.partition_functions AS pf
        JOIN sys.partition_range_values AS prv ON
            prv.function_id = pf.function_id
        WHERE
            pf.name = N'SensorValues_Date_PF'
            AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
    ) AS p
    CROSS APPLY
        (
        SELECT TOP (1)
            v.Date,
            v.SensorValue,
            v.SensorId
        FROM dbo.SensorValues AS v
        WHERE
            $PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
            AND v.DeviceId = @fDeviceId
            AND v.SensorId = Sensor.Id--@fSensorId1
            AND v.Date <= @fDate
        ORDER BY
            v.Date DESC
      ) AS ca
    ORDER BY
        p.partition_number DESC,
        ca.Date DESC
) AS PerSensor;

The query plan is:

Final Query Plan

Estimated query plan cost for three Sensor IDs is 0.011 - half that of the original single-sensor plan.

Postgresql – Track the own custom metadata on Postgres tables and columns

Usually the most portable way of doing this is to have your own metadata table, something like:

create table meta(
  table_name text not null,
  column_name text not null,
  attribute_name text not null,
  attribute_value text not null,
  primary key (table_name, column_name, attribute_name)
);

This approach works with any database
Access to metadata is done by standard SQL
Migration and backup is very easy
The attribute_value can be anything, you can declare it as byte[], text, json, jsonb, whatever you want...

Best Answer

Related Solutions

Sql-server – SELECT multiple sensor values in one query

Postgresql – Track the own custom metadata on Postgres tables and columns

Related Question