SQL Server – How to Exclude All Matching Rows

duplicationsql server

I have a table that has a unique ID, name, email, and some other fields. I want to match this table against another table on email, but sometimes emails are shared. The unique ID is not shared among tables, but I will input the ID into the other table when I have identified them. Name is not unique.

I'd like to build two rounds of joins. The first round I would like to match on email and append the ID, and the second round would match on ID and update fields as necessary.

Is there a way to query into the table and grab only those that have unique emails, while ignoring those that share emails? I could then append the ID and re-query those that don't have IDs and join on the name. Would that even work?

Would this be a good use of a partition? I don't understand those well.

Best Answer

Is there a way to query into the table and grab only those that have unique emails, while ignoring those that share emails?

I can think of a couple ways to do this:

1) If I restate this, I might say "Get all of the email addresses that have only 1 entry in the table, and exclude email addresses having more than one entry in the table. Use GROUP BY...HAVING to identify the non-duplicated email addresses, grouping by the address, and literally looking for a COUNT of 1 (or > 1):

--UNIQUE EMAILS
SELECT EmailAddress 
FROM TableName
GROUP BY EmailAddress
HAVING COUNT(*) = 1;

---NON-UNIQUE EMAILS
SELECT EmailAddress 
FROM TableName
GROUP BY EmailAddress
HAVING COUNT(*) > 1;

This would literally give you the list of email addresses, which you would then need to do stuff with.

2) If I restate your question another way, "Get all of the emails where there doesn't exist another row with the same email address. Use 'EXISTS' and 'NOT EXISTS' and a correlated subquery to identify rows with or without duplicates.

--UNIQUE EMAILS
SELECT t1.EmailId, t1.EmailAddress
FROM TableName t1
WHERE NOT EXISTS (SELECT 1 FROM TableName t2 
                 WHERE t1.EMailId <> t2.EmailId AND t1.EmailAddress = t2.EmailAddress);

--NON-UNIQUE EMAILS
SELECT t1.EmailId, t1.EmailAddress
FROM TableName t1
WHERE EXISTS (SELECT 1 FROM TableName t2 
                 WHERE t1.EMailId <> t2.EmailId AND t1.EmailAddress = t2.EmailAddress);

Personally, I'd go with the second approach, as I think the logic is easier to follow, and I suspect it would perform better (though, that's just a guess).

Related Solutions

Sql-server – Select one name per person where spelling variations exist

You can "group by" Emp_ID and use an aggregate function like MIN() or MAX() to get one of the names:

INSERT INTO TargetTable 
  (Emp_ID, Name) 
SELECT Emp_ID, MIN(Name) 
FROM SourceTable 
GROUP BY Emp_ID ;

And note that there is no inherent order in a table (actually you can define a clustered index for a table and this affects how the rows are stored on the disk but that is no guarantee for the order of retrieval).

You can get the data from the target table with a query afterwards and if you want them ordered, you can (and should) define the order you like (and a different one if you like, every time you query it):

SELECT Emp_ID, Name 
FROM TargetTable 
ORDER BY Name ;

Sql-server – Unique column values between tables

If I understand what you are asking for: A given person can have multiple roles in a team but can only be on one team I would do it like this:

enter image description here

By putting the TeamId in the People table you enforce the fact that a Person can only be on one team. You then have a many:many relationship using a cross join table between People and Roles. This allows a person to have multiple roles. By putting the email in the PeopleRole table you enforce that a person has to have a separate email for each role. You would also want to put a unique index on the PeopleRole.Email column to enforce that all email address have to be different.

The only thing I believe I'm missing is enforcing that a team can only have one leader and one contact. You might be able to do that by adding TeamId to the PeopleRole and then creating a filtered unique index on TeamId & RoleId WHERE the RoleIds are those that can only have one per team. I haven't tried it though.

Best Answer

Related Solutions

Sql-server – Select one name per person where spelling variations exist

Sql-server – Unique column values between tables

Related Question