SQL Query – How to Select Distinct for One Column and Any in Another Column

query

I need to query an SQL database to find all distinct values of one column and I need an arbitrary value from another column. For example, consider the following table with two columns, key and value:

key     value
===     =====
one     test
one     another
one     value
two     goes
two     here
two     also
three   example

I wish to get back one sample row, chosen arbitrarily, from each distinct key, perhaps getting these three rows:

key     value
===     =====
one     test
two     goes
three   example

How can I formulate such a query in SQL?

Best Answer

The easiest query to write is for MySQL (with not strict ANSI settings). It uses the non-standard construction:

SELECT key, value
FROM tableX
GROUP BY key ;

In recent version (5.7 and 8.0+) where the strict settings and ONLY_FULL_GROUP_BY are the default, you can use the ANY_VALUE() function, added in 5.7:

SELECT key, ANY_VALUE(value) AS value
FROM tableX
GROUP BY key ;

For other DBMSs, that have window functions (like Postgres, SQL-Server, Oracle, DB2), you can use them like this. The advantage is that you can select other columns in the result as well (besides the key and value) :

SELECT key, value
FROM tableX
    ( SELECT key, value,
             ROW_NUMBER() OVER (PARTITION BY key 
                                ORDER BY whatever)     --- ORDER BY NULL
               AS rn                                   --- for example
      FROM tableX
    ) tmp 
WHERE rn = 1 ;

For older versions of the above and for any other DBMS, a general way that works almost everywhere. One disadvantage is that you cannot select other columns with this approach. Another is that aggregate functions like MIN() and MAX() do not work with some datatypes in some DBMSs (like bit, text, blobs):

SELECT key, MIN(value) AS value
FROM tableX
GROUP BY key ;

PostgreSQL has a special non-standard DISTINCT ON operator that can also be used. The optional ORDER BY is for selecting which row from every group should be selected:

SELECT DISTINCT ON (key) key, value
FROM tableX
-- ORDER BY key, <some_other_expressions> ;

Related Solutions

Sql-server – Comparing DISTINCT, GROUP BY and ROW_NUMBER() in SQL Server 2008 with data warehouse workloads

In my experience, an aggregate (DISTINCT or GROUP BY) can be quicker then a ROW_NUMBER() approach. Saying that, ROW_NUMBER is better with SQL Server 2008 than SQL Server 2005.

However, you'll have to try for your situation.
Compare query plans, and use Profiler and SET to capture IO, CPU, Duration etc

For a lot of background, see these SO questions:

Why are logical reads for windowed aggregate functions so high? (follow the links from Martin Smith)
can I get count() and rows from one sql query in sql server? (note the comments in the answers from me and Chris Bednarski)

Finally, do you need the ROW_NUMBER approach? It looks like you're fixing a problem caused by de-normalisation.

And some notes:

shouldn't YearID be in the GROUP BY or PARTITION BY?
Won't DISTINCT give different output?
Are these columns indexed?

Sql-server – Update values from one table to another where key values are not unique

This solution seems to work against your sample data.

--Demo setup
Declare @TableA table
    (FirstName varchar(6), LastName varchar(6), Merchant varchar(16), Amount decimal(11,2), TransDate varchar(10), Reference int)
;

INSERT INTO @TableA
    (FirstName, LastName, Merchant, Amount, TransDate, Reference)
VALUES
    ('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 12345),
    ('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 77777),
    ('{Null}', '{Null}', 'BLANK-ROSTER NUM', 16.07, '2018-12-24', 99999),
    ('{Null}', '{Null}', 'BLANK-ROSTER NUM', 16.07, '2018-12-24', 44444),
    ('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 66666)
;

Declare @TableB table
    (FirstName varchar(6), LastName varchar(6), Merchant varchar(16), Amount decimal(11,2), TransDate varchar(10))
;
INSERT INTO @TableB
    (FirstName, LastName, Merchant, Amount, TransDate)
VALUES
    ('SMITH', 'JOHN', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
    ('JONES', 'JAMES', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
    ('MILLER', 'STEVE', 'BLANK-ROSTER NUM', 16.07, '2018-12-24'),
    ('WILSON', 'JACK', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
    ('DAVIS', 'BEN', 'BLANK-ROSTER NUM', 16.07, '2018-12-24')
;

--The solution
;

--Common table expressions to put a row number on each row
--partition by merchant, amount and transdata
WITH TableARowNumbers
AS (
    SELECT *
        ,ROW_NUMBER() OVER (
            PARTITION BY Merchant
            ,Amount
            ,TransDate ORDER BY Merchant
                ,Amount
                ,TransDate
            ) AS rn
    FROM @TableA
    )
    ,TableBRowNumbers
AS (
    SELECT *
        ,ROW_NUMBER() OVER (
            PARTITION BY Merchant
            ,Amount
            ,TransDate ORDER BY Merchant
                ,Amount
                ,TransDate
            ) AS rn
    FROM @TableB
    )

--Update TableA joining on Merchant, Amount, TransDate and rn
UPDATE ta
SET ta.FirstName = tb.FirstName
    ,ta.LastName = tb.LastName
FROM TableARowNumbers ta
JOIN TableBRowNumbers tb
    ON tb.Merchant = ta.Merchant
        AND tb.Amount = ta.Amount
        AND tb.TransDate = ta.TransDate
        AND tb.rn = ta.rn

SELECT *
FROM @TableA

| FirstName | LastName | Merchant         | Amount | TransDate  | Reference |
|-----------|----------|------------------|--------|------------|-----------|
| SMITH     | JOHN     | BLANK-ROSTER NUM | 34.30  | 2018-01-01 | 12345     |
| JONES     | JAMES    | BLANK-ROSTER NUM | 34.30  | 2018-01-01 | 77777     |
| MILLER    | STEVE    | BLANK-ROSTER NUM | 16.07  | 2018-12-24 | 99999     |
| DAVIS     | BEN      | BLANK-ROSTER NUM | 16.07  | 2018-12-24 | 44444     |
| WILSON    | JACK     | BLANK-ROSTER NUM | 34.30  | 2018-01-01 | 66666     |

Best Answer

Related Solutions

Sql-server – Comparing DISTINCT, GROUP BY and ROW_NUMBER() in SQL Server 2008 with data warehouse workloads

Sql-server – Update values from one table to another where key values are not unique

Related Question