Sql-server – How to remove earliest/latest row by two columns (userId, StatusCheck)

sql-server-2008t-sql

I have crosschex database that is used to store logged in/out data. Sometimes it happens that user clock in/out few times. I would like to remove that duplicates leaving the earliest when log in and latest when log out.
Here is what I have:

USE crosschex;
SELECT UI.Name, INOUT.UserId, INOUT.CheckTime, S.StatusText
FROM dbo.Checkinout AS INOUT
INNER JOIN dbo.Userinfo AS UI ON
INOUT.Userid = UI.Userid
INNER JOIN dbo.Status AS S ON INOUT.CheckType = S.Statusid
WHERE 
    UI.Groupid = 1 AND INOUT.CheckTime >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())-3, 0) AND INOUT.CheckTime < DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())-3, 1)

Here is the result:

"Zeab Simon"    "6" "2018-05-04 07:58:06.000"   "In"
"Zeab Simon"    "6" "2018-05-04 07:58:15.000"   "In"
"Zeab Simon"    "6" "2018-05-04 07:58:38.000"   "In"
"Kate B"    "4" "2018-05-04 14:15:45.000"   "In"
"Kate B"    "4" "2018-05-04 14:15:46.000"   "In"
"Zeab Simon"    "6" "2018-05-04 16:34:46.000"   "Out"
"Zeab Simon"    "6" "2018-05-04 16:34:48.000"   "Out"
"Zeab Simon"    "6" "2018-05-04 16:34:49.000"   "Out"
"Zeab Simon"    "6" "2018-05-04 16:34:50.000"   "Out"
"Zeab Simon"    "6" "2018-05-04 16:34:51.000"   "Out"
"Patty" "5" "2018-05-04 20:31:36.000"   "Out"

What I would like is to remove the earliest and lattest for each userId and StatusText In/Oou to get that:

"Zeab Simon"    "6" "2018-05-04 07:58:06.000"   "In"
"Kate B"    "4" "2018-05-04 14:15:45.000"   "In"
"Zeab Simon"    "6" "2018-05-04 16:34:51.000"   "Out"
"Patty" "5" "2018-05-04 20:31:36.000"   "Out"

How can I do it?

Best Answer

To answer this question, I did the following.

CREATE TABLE shift (worker_name VARCHAR(25), worker_id INTEGER, shift_time TIMESTAMP, in_out VARCHAR(3));

INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 07:58:06.000', 'In');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 07:58:15.000', 'In');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 07:58:38.000', 'In');
INSERT INTO shift VALUES ('Kate B',     4, '2018-05-04 14:15:45.000', 'In');
INSERT INTO shift VALUES ('Kate B',     4, '2018-05-04 14:15:46.000', 'In');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 16:34:46.000', 'Out');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 16:34:48.000', 'Out');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 16:34:49.000', 'Out');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 16:34:50.000', 'Out');
INSERT INTO shift VALUES ('Zeab Simon', 6, '2018-05-04 16:34:51.000', 'Out');
INSERT INTO shift VALUES ('Patty',      5, '2018-05-04 20:31:36.000', 'Out');

I then ran the following query:

SELECT worker_name, MIN(shift_time), in_out 
FROM shift 
WHERE in_out = 'In' 
GROUP BY worker_name, in_out
UNION
SELECT worker_name, MAX(shift_time), in_out 
FROM shift 
WHERE in_out = 'Out' 
GROUP BY worker_name, in_out
ORDER BY worker_name  

Which gives:

worker_name,         shift_time,   in_out
Kate B       2018-05-04 14:15:45       In
Patty        2018-05-04 20:31:36      Out
Zeab Simon   2018-05-04 07:58:06       In
Zeab Simon   2018-05-04 16:34:51      Out   

A cursory inspection reveals that these are the correct times - earliest clock_in and latest clock_out. A db-fiddle is available here. To get the best answers to your questions, you might consider giving a fiddle in the question itself - help us to help you!

Following the OP's clarifications in comments the need to delete extraneous records from the same table, I used the result above in a CTE (Common Table Expression - AKA the WITH clause) to delete the other records from the main shift table as follows.

WITH cte AS
(
  SELECT worker_name, worker_id,   
  MIN(shift_time) as shift_time, in_out   
  FROM shift 
  WHERE in_out = 'In'  
  GROUP BY worker_name, worker_id, in_out
  UNION
  SELECT worker_name, worker_id, 
  MAX(shift_time) as shift_time, in_out 
  FROM shift WHERE in_out = 'Out' GROUP 
  BY worker_name, worker_id, in_out
)
-- select * from cte  -- testing
DELETE FROM shift
WHERE shift_time NOT IN 
(
  SELECT shift_time FROM cte    
);

And then a simple SELECT in the fiddle shows that the correct result has been achieved.

Another solution was kindly provided by @ypercube here - worth looking at for learning purposes. It is actually (no surprise here) more correct. My SQL will not work properly if there are exactly matching times for a MAX(or MIN) and a non significant value!