First, if you SELECT
only from the teams
table and there is a FOREIGN KEY
constraint that REFERENCES players
, you don't need the join to players
at all.
Now, there are various ways to write this type of queries, there is even a tag at SO and DBA.SE, greatest-n-per-group. We need the greatest 1 in this case. The most simple code is to use the DISTINCT ON
construct (which is a Postgres, non-standard addition to SQL).
If you need columns from players
, you can simply join and add the columns in the select list.:
SELECT DISTINCT ON (t.player_id)
t.player_id,
t.team_name,
t.start_date,
t.end_date
FROM teams AS t
-- JOIN players AS p
-- ON p.id = t.player_id
ORDER BY
t.player_id, -- this needs to match the DISTINCT ON ()
.... -- pick what order you want, so the 1st is chosen for each player
; -- no LIMIT
Another way that works in most other DBMS is to use window functions. The criteria for which row to be chosen goes inside the OVER
clause:
SELECT
g.player_id,
g.team_name,
g.start_date,
g.end_date
FROM
( SELECT
t.player_id,
t.team_name,
t.start_date,
t.end_date,
ROW_NUMBER() OVER (PARTITION BY t.player_id
ORDER BY ...) AS rn
FROM teams AS t
-- JOIN players AS p
-- ON p.id = t.player_id
) AS g
WHERE
g.rn = 1 ;
If you are not in some ancient version of Postgres, there is also the LATERAL
syntax, that is often the most efficient. Especially when there is a "driving" table with the distinct values we want to base the grouping (the players
here) and if these values are relatively small in number and have many possible options (i.e. in this case, the biggest the teams
table is, in comparison to the players
, the better for this query).
Also notice how this query resembles very much your initial idea. It does, literally, exactly what you want: allow the players
columns to be referenced inside the lt
subquery:
SELECT
lt.player_id,
lt.team_name,
lt.start_date,
lt.end_date
FROM players AS p
JOIN LATERAL
( SELECT t.*
FROM teams AS t
WHERE p.id = t.player_id
ORDER BY .... -- pick the order
LIMIT 1
) AS lt ON TRUE ;
This might work. Basically it's a ROW_NUMBER function that you will have to identify a key for (you Mentioned InvoiceNumber). Once you do, it will return an ordered value where all your "duplicates" will be 2+. Simply adding a where clause where ROWNUM =1 should get you the first record (ordered by the CreatedDate).
SELECT main.* FROM
(
SELECT
Upper(WFI.COMPANYID) as Company,
WFI.USERID as UID,
INF.NAME as Approver,
CONVERT(varchar(10),duedatetime,4) as Due,
right(left(Document,20),10) as InvoiceDate,
SUBSTRING(DOCUMENT,22,charindex(' ',RIGHT(DOCUMENT,len(DOCUMENT)-21))) as InvoiceNumber,
Right(document,len(Document)-(20+charindex(' ',RIGHT(DOCUMENT,len(DOCUMENT)-21)))) as Vendor,
wfi.CREATEDDATETIME as Created,
--Added RowNumber Function Below
ROW_NUMBER() OVER (PARTITION BY **InsertYourKeyToAUniqueRecordHere** ORDER BY wfi.CREATEDDATETIME DESC) AS ROWNUM
FROM
[TEST].[tst].[WORKFLOWWORKITEMTABLE] WFI
INNER JOIN [TEST].[tst].[Workflowtrackingstatustable] WFS ON WFI.CORRELATIONID=WFS.CORRELATIONID
INNER JOIN [TEST].[tst].[HCMWORKER] HCM on WFI.USERID=HCM.PERSONNELNUMBER
INNER JOIN [TEST].[tst].[DIRPERSONNAME] DPN ON DPN.PERSON=HCM.PERSON
INNER JOIN [TEST].[tst].[LEDGERJOURNALTABLE] LJT ON WFI.REFRECID = LJT.RECID
INNER JOIN [TEST].[tst].[USERINFO] INF ON WFI.USERID = INF.ID
WHERE
DATASOURCENAME Like 'Ledgerjourna%'
AND Datediff(day,Duedatetime,getdate())>3
AND WFS.DOCUMENTTYPE='Special'
)main
WHERE main.ROWNUM =1 --Add this clause to only return the first record
ORDER BY main.Company asc
Feel free to comment out the WHERE main.ROWNUM=1
clause so that you can see your Rownum in action.
If you don't want your ROWNUM
column to show in your final result set, then just replace the first SELECT *
with the actual columns you want to select (using their aliases).
Best Answer
give your example data this should do it.
If you need to pick the
f_time
according to some other column it gets messy.In Postgresql you can use the
distinct on
SQL extension.i think there's also a way using window functions