Sql-server – Incorrect result of query using JOIN of 4 tables

relational-divisionsql server

If someone could help me with the query below it would be great! This query is quite challenging, and I tried to do so on my own, but couldn't display the correct result…

I work with DB of a system that is alike "Facebook".

The query should display the email & name of users who wrote a comment for every post, that posted by each of their friends, in the last year.

Meaning, I would like to display ONLY the people who commented on ALL the posts of ALL their friends.


This is my trial, but I received partial results.
I cannot understand what I have done wrong.

select distinct U.Mail, U.FirstName + ' ' + U.LastName as FullName 
from Users U
inner join FriendsList FL on U.Mail = FL.Mail1
inner join Post P on FL.Mail2 = P.UserMail
left outer join Comment C on P.ID = C.IDPost and P.UserMail <> C.Mail
where datediff(year, P.DatePosted, getdate()) <= 1
group by U.Mail, U.FirstName, U.LastName
having count(distinct P.ID) = count(distinct C.IDPost)

Link to BAK file (including test data):
https://file.io/SagvM3cx

Tables diagram:
enter image description here


EXAMPLES:

Let's assume that I & Adam & Ben are friends.

⭐️ First case:

  • Adam wrote 1 post, and I commented.
  • Ben wrote 1 post, and I did not comment.

Then, my name should not display in the query. ?

  • Adam wrote 1 post, and I commented.
  • Ben wrote 1 post, and I commented.

Then, my name should display in the query.?

⭐️ Second case:

  • Adam wrote 2 posts, and I commented on the first one but did not comment the second.
  • Ben wrote 1 post, and I commented.

Then, my name should not display in the query.?

  • Adam wrote 2 posts, and I commented both.
  • Ben wrote 1 post, and I commented.

Then, my name should display in the query.?

⭐️ Third case:

  • Adam wrote 2 posts, and I commented both.
  • Ben wrote 1 post, and I commented.
  • Mark (Not my friend) wrote 1 post, and I did not comment.

Then, my name should display in the query.?


EDIT: I added 8 queries – 4 create table and 4 insert.
According to this example, the query above should display only Kelly

create table Users
(
Mail nvarchar (20) primary key check(Mail like '_%@_%._%' and (Mail like '%[0-9]%' Or Mail like '%[a-z]%'Or Mail like '%[A-Z]%')),
Password nvarchar (8) check (Password like '%[0-9]%' and Password like '%[az]%' and len(password) <= 8) not null,
FirstName nvarchar (20) not null,
LastName nvarchar (20) not null,
BirthDate date check (datediff(year,BirthDate,getdate())>=18) not null,
JoinDate date check (JoinDate<=getdate()) not null,
Gender nchar(1) check(Gender = 'F' or Gender = 'M' or Gender = 'O'),
NickName nvarchar(20),
Photo nvarchar(20),
Phone bigint check (Phone like '%[0-9]%' and len(Phone) <= 10) not null
)

INSERT INTO Users
VALUES
('Kelly@gmail.com','k1000000','Kelly','Ka','1992-05-15','2016-09-04','F','Kelly','Kelly.jpg','546296100'),
('Lilly@gmail.com','l1101111','Lilly','La','1999-04-03','2012-04-04','F','Lilly','Lilly.jpg','542448300'),
('Moshe@gmail.com','m120121','Moshe','Ma','1995-06-03','2011-04-02','M','Moshe','MosheMa.jpg','542840111'),
('Nelly@gmail.com','n130131','Nelly','Na','1994-03-07','2020-04-13','F','Nelly','NellyNa.jpg','541234567');
('Owen@gmail.com','o140141','Owen','Oa','1992-02-02','2020-05-13','M','Owen','OwenOa.jpg','541234567');

create table FriendsList
(
Mail1 nvarchar (20) references Users(Mail) not null,
Mail2 nvarchar (20) references Users(Mail) not null,
DateAdding date check (DateAdding<=getDate()) not null,
primary key (Mail1,Mail2)
)

INSERT INTO FriendsList
VALUES
('Kelly@gmail.com','Nelly@gmail.com','2018-04-18'),
('Lilly@gmail.com','Moshe@gmail.com','2020-04-22'),
('Moshe@gmail.com','Lilly@gmail.com','2020-04-22'),
('Moshe@gmail.com','Nelly@gmail.com','2020-04-22'),
('Nelly@gmail.com','Kelly@gmail.com','2018-04-18');
('Nelly@gmail.com','Moshe@gmail.com','2020-04-22'),

create table Post
(
ID int identity(1,1) primary key,
Photo nvarchar(20),
Text nvarchar(200),
Location nvarchar(50),
Video int,
DatePosted date check (datediff(month,DatePosted,getdate())<=3) not null,
UserMail nvarchar (20) references Users(Mail) on delete cascade on update
cascade not null
)

INSERT INTO Post
VALUES
('','my name is nellu','','','2020-04-18','Nelly@gmail.com'),
('','hii','','','2020-05-19','Lilly@gmail.com');

create table Comment
(
IDPost int references Post(ID) on delete cascade on update cascade not null,
SerialNumComment int check(SerialNumComment > 0) not null,
DateAndTimeComment Datetime check (DateAndTimeComment<=getDate()) not null,
Text nvarchar(200) not null,
Mail nvarchar (20) references Users(Mail) not null,
primary key (IDPost,SerialNumComment)
)

INSERT INTO Comment
VALUES
('1','1','2020-04-18','blabla','Kelly@gmail.com'),
('2','1','2020-05-05','bhfk','Moshe@gmail.com');

Best Answer

The problem in your original query is that

left outer join Comment C on P.ID = C.IDPost and P.UserMail <> C.Mail

Should be

left outer join Comment C on P.ID = C.IDPost and  U.Mail = C.Mail

You are trying to get comments posted by the original user

Personally I find the double NOT EXISTS approach easier to follow (though the semantics of how to treat Users with no eligible posts to comment on differs).

You are looking for Users where there is no recent post from one of their friends that does not have a comment from that User.

This implements that logic

SELECT U.Mail,
       U.FirstName + ' ' + U.LastName AS FullName
FROM   Users U
WHERE  NOT EXISTS (
                  --A post by one of their friends in the last year
                  SELECT *
                   FROM   FriendsList FL
                          INNER JOIN Post P
                                  ON FL.Mail2 = P.UserMail
                   WHERE  U.Mail = FL.Mail1
                          AND P.DatePosted >= DATEADD(YEAR, -1, GETDATE())
                          AND NOT EXISTS (
                                         --A comment by the user on that post
                                         SELECT *
                                          FROM   Comment C
                                          WHERE  P.ID = C.IDPost
                                                 AND U.Mail = C.Mail))