MySQL JOIN query produces wrong result

join;MySQL

I have two tables complaints and complaints_reply in my MySQl database. Users can add complaints which are stored in complaints the complaints reply are stored in complaints_reply table. I am trying to JOIN both these table contents on a specific condition. Before I mention what I am trying to get and the problem I faced, I will explain the structure of these two tables first.


NB: The person who adds complaints is complaint owner & person who adds a complaint reply is complaint replier. Complaint owner can also add replies. So he can either be the complaint owner or the complaint replier. The two tables have a one-to-many relationship. A complaint can have more than one complaint reply. member_id in complaint table represents complaint owner & mem_id in complaints_reply represent complaint replier


DESIRED OUTPUT:

Join the two tables and fetch values and show the complaint and complaint’s reply as a single result set. But the condition is kinda tricky. The last added complaint reply from the complaints_reply table should be fetched for the complaint in complaints table in such a way that the complaint owner should not be the complaint replier. I use posted_date & posted_time from complaints_reply table to fetch the last added complaint reply for a complaint & that complaint replier has to be shown in the result set.

So, from the sample data the tables contain now, the output that I should get is:

+------+---------+----------+-------------+-------------------+
| id   | title   |member_id |last_replier |last_posted_dt     |
+------+---------+----------+-------------+-------------------+
|    1 | x       | 1000     |2002         | 2015-05-2610:11:17|
|    2 | y       | 1001     |1000         | 2015-05-2710:06:16|
+------+---------+----------+-------------+-------------------+

But what I got is:

+------+---------+----------+-------------+-------------------+
| id   | title   |member_id |last_replier |last_posted_dt     |
+------+---------+----------+-------------+-------------------+
|    1 | x       | 1000     |1001         | 2015-05-2610:11:17|
|    2 | y       | 1001     |2000         | 2015-05-2710:06:16|
+------+---------+----------+-------------+-------------------+

The date is correct, but the returned complaint replier last_replier is wrong.

This is my query.

SELECT com.id,
       com.title,
       com.member_id,
       last_comp_reply.last_replier,
       last_comp_reply.last_posted_dt
FROM complaints com
LEFT JOIN
  (SELECT c.id AS complaint_id,
          c.member_id AS parent_mem_id,
          cr.mem_id AS last_replier,
          max(cr.posted_dt) AS last_posted_dt
   FROM
     (SELECT cr.complaint_id,cr.mem_id,c.id,c.member_id,(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
      FROM complaints_reply cr,
           complaints c
      WHERE cr.complaint_id=c.id
        AND cr.mem_id!=c.member_id
      GROUP BY cr.complaint_id,
               cr.mem_id,
               posted_dt)cr,
        complaints c
   WHERE cr.complaint_id=c.id
   GROUP BY cr.complaint_id,
            c.id,
            c.member_id) AS last_comp_reply ON com.id=last_comp_reply.complaint_id

Table structure for table complaints

CREATE TABLE IF NOT EXISTS `complaints` (
  `id` int(11) NOT NULL,
  `title` varchar(500) NOT NULL,
  `member_id` int(11) NOT NULL,
  `posted_date` date NOT NULL,
  `posted_time` time NOT NULL 
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

Indexes for table complaints

ALTER TABLE `complaints`
 ADD PRIMARY KEY (`id`);

AUTO_INCREMENT for table complaints

ALTER TABLE `complaints`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=3;

Dumping data for table complaints

INSERT INTO `complaints` (`id`, `title`, `member_id`, `posted_date`, `posted_time`) VALUES
(1, 'x', 1000, '2015-05-05', '02:06:15'),
(2, 'y', 1001, '2015-05-14', '02:08:10');

Table structure for table complaints_reply

CREATE TABLE IF NOT EXISTS `complaints_reply` (
`id` int(11) NOT NULL,
  `complaint_id` int(11) NOT NULL,
  `comments` text NOT NULL,
  `mem_id` int(11) NOT NULL,
  `posted_date` date NOT NULL,
  `posted_time` time NOT NULL
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;

Indexes for table complaints_reply

ALTER TABLE `complaints_reply`
 ADD PRIMARY KEY (`id`);

AUTO_INCREMENT for table complaints_reply

ALTER TABLE `complaints_reply`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=10;

Dumping data for table complaints_reply

INSERT INTO `complaints_reply` (`id`, `complaint_id`, `comments`, `mem_id`, `posted_date`, `posted_time`) VALUES
(1, 1, 'reply1', 2000, '2015-05-08', '02:07:08'),
(2, 1, 'reply2', 2001, '2015-05-06', '06:05:08'),
(3, 1, 'reply3', 1000, '2015-05-14', '02:12:13'),
(4, 2, 'hola', 1000, '2015-05-27', '10:06:16'),
(5, 2, 'hello', 2000, '2015-05-04', '03:09:09'),
(6, 2, 'gracias', 1001, '2015-05-31', '06:12:18'),
(7, 1, 'reply4', 1001, '2015-01-04', '04:08:12'),
(8, 2, 'puta', 1001, '2015-06-13', '06:12:18'),
(9, 1, 'reply5', 1000, '2015-06-01', '04:08:12'),
(10, 1, 'reply next', 2002, '2015-05-26', '10:11:17');
P.S.

To give an idea about what my query is all about, I'll explain the sub query that is used to combine the tables & give result based on the condition: complaint owner should not be the complaint replier is:

SELECT cr.complaint_id,
       cr.mem_id,
       c.id,
       c.member_id,
       (CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
     complaints c
WHERE cr.complaint_id=c.id
  AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
         cr.mem_id,
         posted_dt

And the result for this is:

+--------------+---------+----------+-------------+-------------------+
| complaint_id | mem_id  | id       |member_id    |     posted_dt     |
+--------------+---------+-------   +-------------+-------------------+
|    1         | 1001    | 1        |1000         | 2015-01-0404:08:12|
|    1         | 2000    | 1        |1000         | 2015-05-0802:07:08|
|    1         | 2001    | 1        |1000         | 2015-05-0606:05:08|
|    1         | 2002    | 1        |1000         | 2015-05-2610:11:17|
|    2         | 1000    | 2        |1001         | 2015-05-2710:06:16|
|    2         | 2000    | 2        |1001         | 2015-05-0403:09:09|
+--------------+---------+----------+-------------+-------------------+

member_id here represents complaint owner and mem_id represents complaint replier

The inner query gives the result based on the condition, then everything after this goes haywire. I don't know where I made mistake. The complaint replies added by complaint owner is not fetched in this table. So far so good. Is there any alternative way to get the result from here?

Best Answer

The problem is the subquery, where you combine in the select list both an aggregated (max(cr.posted_dt)) and non-aggregated expressions/columns from the tables.

MySQL allows you to run this kind of inconsistent queries - with default settings - which basically means it depends on the developer to write consistent queries and not fire themselves on the foot. You can change the sql mode to ONLY_FULL_GROUP_BY and see what happens if you try to run your query.

Now to solve the issue, it seems you want a [greatest-n-per-group] type of query. There are several ways to do this, all quite complicated in MySQL (because it lacks window functions). Check the relevant tags in this site and in the SO main site.

Here's one way:

SELECT c.id,
       c.title,
       c.member_id,
       cr.mem_id    AS last_replier,
       CONCAT(cr.posted_date, 'T', cr.posted_time) AS last_posted_dt
FROM complaints AS c
  LEFT JOIN complaints_reply AS cr
    ON  cr.id = 
        ( SELECT crl.id
          FROM complaints_reply AS crl
          WHERE crl.complaint_id = c.id
            AND crl.mem_id <> c.member_id
          ORDER BY posted_date DESC, posted_time DESC
          LIMIT 1
        ) ;

It will give you consistent results and also be quite efficient, if you add an index on (complaint_id, posted_date, posted_time, mem_id)

Tested at SQLfiddle.