Sql-server – How to find the nth highest salary in SQL

join;MySQLsql server

I have found a query to find the nth highest salary from Employee table , but i don't understand the logic of (N-1)?

EmpID         Salary
1             90000
2             80000
3             54000
4             37000
5             12000
6             69000
7             50000

SELECT * FROM Employee E1
WHERE (N-1) = (
                SELECT COUNT(DISTINCT(E2.Salary))
                FROM Employee E2
                WHERE E2.Salary > E1.Salary
              )

If N= 4, then how does the query work? I'm a complete beginner in SQL, please help!

Best Answer

What's happening here is the subquery is looking at each individual salary and essentially ranking them, then comparing those salaries to the outer salary (a separate query against the same table). So in this case if you said N = 4 it is saying:

WHERE 3 = (number of salaries > outer salary)

So looking at the data you have, let's rank them in order, and compare.

EmpID   Salary   How many *distinct* salaries are greater than this one?
-----   ------   -------------------------------------------------------
5       12000    6
4       37000    5
7       50000    4
3       54000    3
6       69000    2
2       80000    1
1       90000    0

So when n = 4, the row that will be returned is EmpID 3 (54000).

A much more intuitive way to write this query, in my opinion, is to use windowing functions like RANK(), ROW_NUMBER() or DENSE_RANK() (depending on whether or not you want ties). Let's take a look at how these different functions work against your data (and I've added an 8th row to represent a tie for 4th place):

DECLARE @salary TABLE(EmpID INT, Salary INT);

INSERT @salary VALUES
(1,90000),(2,80000),(3,54000),(4,37000),
(5,12000),(6,69000),(7,50000),(8,54000);

;WITH x AS
(
  SELECT EmpID, Salary, 
    r  = RANK()       OVER (ORDER BY Salary),
    dr = DENSE_RANK() OVER (ORDER BY Salary),
    rn = ROW_NUMBER() OVER (ORDER BY Salary)
  FROM @salary
)
SELECT EmpID, Salary, r, dr, rn FROM x;

Results:

EmpID  Salary   r   dr  rn
-----  ------   --  --  --
5      12000    1   1   1
4      37000    2   2   2
7      50000    3   3   3
8      54000    4   4   4
3      54000    4   4   5
6      69000    6   5   6
2      80000    7   6   7
1      90000    8   7   8

I don't think you'd want to use RANK() for this specific problem, because of the way it works there is no 5th place, for example. So now it comes down to whether you want to include multiple rows in the case of a tie, and if not, if you want an arbitrary row or a specific row based on some criteria. So adjusting the statement slightly:

-- if you want ties:
;WITH x AS
(
  SELECT EmpID, Salary, 
    dr = DENSE_RANK() OVER (ORDER BY Salary)
  FROM @salary
)
SELECT EmpID, Salary FROM x WHERE dr = 4;

-- results:
-- 3   54000
-- 8   54000

-- to take the *lowest* EmpID:
;WITH x AS
(
  SELECT EmpID, Salary, 
    rn = ROW_NUMBER() OVER (ORDER BY Salary, EmpID)
  FROM @salary
)
SELECT EmpID, Salary FROM x WHERE rn = 4;

-- results:
-- 3   54000

-- to take the *highest* EmpID:
;WITH x AS
(
  SELECT EmpID, Salary, 
    rn = ROW_NUMBER() OVER (ORDER BY Salary, EmpID DESC)
  FROM @salary
)
SELECT EmpID, Salary FROM x WHERE rn = 4;

-- results:
-- 8   54000