Sql-server – Find “second-greatest” string in each “group”

sql server

I'm doing some data analysis and want to find an easy way to examine all the members of each "group" in a group by function.

Like, 3 agents may be involved in an order. I want to quickly examine the three agents that were 'grouped' in this order for various reasons.

Usually, I would use group_concat for this (easy way to see all grouped strings). However replicating that using a 'group by' appears difficult and unwieldy for now in SQL sever.

Right now, rough-and-dirty, I would max(username) and min(username) to quickly find 2 (and 90% of orders probably have 2 or less people. Is there a way to do mid(username) or 2nd-highest(username), or percentile(50th, username)? That would be a great, quick way to find this relevant data. For some reason, the previous answers I've seen describing group_concat on SQL Server do not sound straightforward to me.

Sample data for instance:

employee  purchase_id
bill        1
bob         1
chrissy     1
mike        2
bill        2
bob         3

Currently I have this:

purchase_id, employee_count, complicated metric
1                 3              blahblah
2                 2                dsflsajf
3                 1                98%

I would like to see at a glance:

purchase_id, employees, complicated metric
1            (bill,bob,chrissy)   blahblah

However the group_concat seems very confusing to use with a group by statement – or simulating group_concat with SQL Server. So instead, how bout this.

select max(employee), min(employee)

purchase_id,  max(employee), min(employee)
1              bill            chrissy

in the example you see that bob is omitted, as max/ min will only find the two endpoints. If there was some kind of function to pull the second highest value, or 50th percentile value, on strings, that would be helpful.

Best Answer

Something like this would split the results into multiple columns, but you would need to know in advance the maximum number of employees per purchase_id:

With Ordered_cte As ( 
    Select employee,
        purchase_id,
        RowNo = Row_Number() Over (Partition By purchase_id Order By employee)
      From tbl_purchase_employee)
Select purchase_id,
    Employee1 = Max(iif(RowNo = 1, employee, Null)),
    Employee2 = Max(iif(RowNo = 2, employee, Null)),
    Employee3 = Max(iif(RowNo = 3, employee, Null))
  From Ordered_cte
  Group By purchase_id;

Related Solutions

Sql-server – Remove string after second specific character occures from left

You can use the third parameter of charindex() that is used to specify where in the string the search will start.

declare @S varchar(20) = '45465@6464@654';
select left(@S, charindex('@', @S, charindex('@', @S)+1)-1);

Result

45465@6464

Sql-server – Find the winner of each stage

There are tens of different ways to do this in SQL. Lets start with the simple correlated subquery (mind the fancy name, once you see and write a few of them, they are very easy to understand):

select                                -- show
    g.name, g.stage, g.score          -- all data
from                                  -- from
    game as g                         -- the table
where                                 -- where
    not exists                        -- there isn't
        ( select *                    -- any other 
          from game as g2             -- from the same table
          where g2.stage = g.stage    -- and the same stage
            and g2.score > g.score    -- with bigger score
        ) ;

Another simple way would be to first find the biggest score for each stage using GROUP BY (in a subquery, either a derived table or a CTE) and then JOIN back to the original table:

-- using derived table
select      
    g.name, g.stage, g.score 
from      
    game as g 
  join
    ( select stage, max(score) as score
      from game
      group by stage
    ) as m
  on  m.stage = g.stage
  and m.score = g.score ;

-- using CTE
with stage_max as
    ( select stage, max(score) as score
      from game
      group by stage
    ) 
select      
    g.name, g.stage, g.score 
from      
    game as g 
  join
    stage_max as m
  on  m.stage = g.stage
  and m.score = g.score ;

A more modern way would be to use window functions (available in your SQL Server versions), i.e. the RANK() function, so first you get the "rank" of everyone per stage and then select only the ones with rank=1. This can also be done with either a derived table or a CTE:

-- window functions, using derived table
select      
    w.name, w.stage, w.score 
from      
    ( select name, stage, score,
             rnk = rank() over (partition by stage
                                order by score desc)
      from game
    ) as w
where
    w.rnk = 1 ;

-- window functions, using CTE
with ranking as
    ( select name, stage, score,
             rnk = rank() over (partition by stage
                                order by score desc)
      from game
    ) 
select      
    w.name, w.stage, w.score 
from      
    ranking as w 
where
    w.rnk = 1 ;

Best Answer

Related Solutions

Sql-server – Remove string after second specific character occures from left

Sql-server – Find the winner of each stage

Related Question