Sql-server – How group with min(date) and select an another column in the same table

greatest-n-per-groupgroup bysql-server-2000

I have a query who return something like it :

Name   Gender Job         date of hire
John   M      mechanic    2012-05-08
John   M      electrician 2010-01-01
Vicky  F      scientific  2012-11-11
Bob    M      NULL        NULL

I need to have the name, gender and jobname of first job of each person. But i don't know how.
My query look like this :

select name,gender,jobname,hiredate
from person p
left join job j on p.personid = j.personid

I'm on Microsoft SQL Server 2000

I need this as result :

Name   Gender Job         
John   M      electrician
Vicky  F      scientific 
Bob    M      NULL       

Best Answer

I infer that your data looks like this:

Person Table

╔══════════╦═══════╦════════╗
║ PersonID ║ Name  ║ Gender ║
╠══════════╬═══════╬════════╣
║        1 ║ John  ║ M      ║
║        2 ║ Vicky ║ F      ║
║        3 ║ Bob   ║ M      ║
╚══════════╩═══════╩════════╝

Job Table

╔══════════╦═════════════╦════════════╗
║ PersonID ║   JobName   ║  HireDate  ║
╠══════════╬═════════════╬════════════╣
║        1 ║ Electrician ║ 2010-01-01 ║
║        1 ║ Mechanic    ║ 2012-05-08 ║
║        2 ║ Scientific  ║ 2012-11-11 ║
╚══════════╩═════════════╩════════════╝

The first task is to find the first job (by hire date) for each person. One neat way to do that is by using a correlated subquery:

SELECT j.*
FROM dbo.Job AS j
WHERE 
    j.HireDate =
    (
        SELECT MIN(j2.HireDate)
        FROM dbo.Job AS j2
        WHERE j2.PersonID = j.PersonID
    );

Notice the correlation WHERE j2.PersonID = j.PersonID between the inner and outer queries there. The output of that query is:

╔══════════╦═════════════╦════════════╗
║ PersonID ║   JobName   ║  HireDate  ║
╠══════════╬═════════════╬════════════╣
║        1 ║ Electrician ║ 2010-01-01 ║
║        2 ║ Scientific  ║ 2012-11-11 ║
╚══════════╩═════════════╩════════════╝

The execution plan (given a clustered PRIMARY KEY on PersonID, HireDate) is:

First job plan

The interesting thing about that plan is the Job table is only scanned once, despite there being two references to it in the original query. The plan uses an optimization that I call Segment Top. Essentially the execution engine takes advantage of the index order to detect the start of a new group (segment) and take just the first row from each group (top).

Now that we have that result, all we need do is join it back to the Person table:

SELECT
    p.PersonName,
    p.Gender, 
    j.JobName
FROM dbo.Person AS p
LEFT JOIN
(
    -- Previous query
    SELECT j.*
    FROM dbo.Job AS j
    WHERE 
        j.HireDate =
        (
            SELECT MIN(j2.HireDate)
            FROM dbo.Job AS j2
            WHERE j2.PersonID = j.PersonID
        )
) AS j ON 
    j.PersonID = p.PersonID
OPTION (MERGE JOIN);

The execution plan is:

Final plan

The OPTION (MERGE JOIN) is not required; I just added it to show the plan you are likely to get when the tables contain a larger number of rows than in this small example.


Table definitions and sample data:

CREATE TABLE dbo.Person
(
    PersonID    integer NOT NULL,
    PersonName  varchar(30) NOT NULL,
    Gender      char(1) NOT NULL,

    PRIMARY KEY (PersonID)
);

CREATE TABLE dbo.Job
(
    PersonID    integer NOT NULL,
    JobName     varchar(30) NOT NULL,
    HireDate    datetime NOT NULL,

    PRIMARY KEY (PersonID, HireDate)
);

INSERT dbo.Person
    (PersonID, PersonName, Gender)
SELECT 1, 'John', 'M' UNION ALL
SELECT 2, 'Vicky', 'F' UNION ALL
SELECT 3, 'Bob', 'M';

INSERT dbo.Job
    (PersonID, JobName, HireDate)
SELECT 1, 'Mechanic', '20120508' UNION ALL
SELECT 1, 'Electrician', '20100101' UNION ALL
SELECT 2, 'Scientific', '20121111';