You should be able to use conditional aggregation to get the username with both a value in col2
as well as null
.
I'd suggest using a HAVING clause with the conditions. The query would be similar to:
select username
from yourtable
group by username
having sum(case when col2 is not null then 1 else 0 end) = 1
and sum(case when col2 is null then 1 else 0 end) = 1
See SQL Fiddle with Demo. This query groups your data by each username and then uses conditional logic to check if col2
meets both conditions you want - where col2
is not null and col2
is null.
You can then use this in a subquery, etc to get the username
and col2
values:
select
t.username,
t.col2
from yourtable t
inner join
(
select username
from yourtable
group by username
having sum(case when col2 is not null then 1 else 0 end) = 1
and sum(case when col2 is null then 1 else 0 end) = 1
) d
on t.username = d.username
See SQL Fiddle with Demo.
If you have more than one col2
row with both null
and another value, then you just need to alter the HAVING
clause slightly:
select
t.username,
t.col2
from yourtable t
inner join
(
select username
from yourtable
group by username
having sum(case when col2 is not null then 1 else 0 end) >= 1
and sum(case when col2 is null then 1 else 0 end) >= 1
) d
on t.username = d.username;
See SQL Fiddle with Demo
This is a potential solution, just based on your sample data only, if your data sample has more complexity, such as in ONE day, you have multiple records of different extract_datetime for both prod and staging tables, we may need to find other ways (but in such scenario, your business rules may need to be modified to remove confusions, currently your business rules and your sample data are excellently matched, i.e. no confusion or grey area)
use tempdb
--drop table dbo.Prod, dbo.Staging, #tmp;
--go
-- create sample tables
create table dbo.Prod (datefield datetime, field1 varchar(20), extract_datetime datetime);
create table dbo.Staging (datefield datetime, field1 varchar(20), extract_datetime datetime);
create table #tmp (datefield datetime) -- holds the deleted [DateField] value form dbo.prod
go
-- populate sample tables
insert into dbo.Prod (datefield, field1, extract_datetime) values
('20160701', 'P11', '2016-08-02 00:10:00')
,('20160701', 'P12', '2016-08-02 00:10:00')
,('20160701', 'P12', '2016-08-02 00:10:00')
,('20160701', 'P14', '2016-08-02 00:10:00')
,('20160702', 'P21', '2016-08-05 00:10:00')
,('20160702', 'P22', '2016-08-05 00:10:00')
,('20160703', 'P31', '2016-08-05 00:10:00')
,('20160703', 'P32', '2016-08-05 00:10:00')
,('20160703', 'P33', '2016-08-05 00:10:00')
go
insert into dbo.Staging (datefield, field1, extract_datetime) values
('20160701', 'S11', '2016-08-25 00:10:00')
,('20160702', 'S21', '2016-08-25 00:10:00')
,('20160702', 'S22', '2016-08-25 00:10:00')
,('20160702', 'S23', '2016-08-25 00:10:00')
,('20160702', 'S24', '2016-08-25 00:10:00')
,('20160703', 'S31', '2016-07-05 00:10:00')
,('20160703', 'S32', '2016-07-05 00:10:00')
,('20160703', 'S33', '2016-07-05 00:10:00')
,('20160703', 'S34', '2016-07-05 00:10:00')
,('20160703', 'S35', '2016-07-05 00:10:00')
,('20160707', 'S41', '2016-07-05 00:10:00') -- new record as per new requiremnt, 2016/08/15
go
-- delete records from Prod table as per business rules, but log the deleted [DateField] into #tmp for next step use
merge dbo.Prod as p
using dbo.Staging as s
on (p.DateField = s.DateField)
when matched and p.Extract_dateTime < s.Extract_datetime
then
delete
output deleted.datefield into #tmp;
-- re-populate the prod table with records from staging table
insert into dbo.prod (datefield, field1, extract_datetime)
select datefield, field1, extract_datetime
from dbo.Staging
where Datefield in (select distinct datefield from #tmp)
or Datefield not in (select datefield from dbo.Prod) -- added as per new requirement, 2016/08/15
-- check final result
select * from dbo.prod
order by Datefield
Best Answer
Use an
AGGREGATE
function likeMIN
orMAX
.If you only want one record per ID, you have to choose which date you want. The first (MIN) or the latest (MAX).
https://docs.microsoft.com/en-us/sql/t-sql/functions/aggregate-functions-transact-sql?view=sql-server-2017
^^Above information is not what was requested.
Edited per clarification in comments: Just use the HAVING to identify which ID values have more than one date, then apply that to a new SELECT.