MySQL multiple rows with same id meeting two different conditions at same time

MySQL

I am using OTRS helpdesk ticket management system. When i execute following Query:

SELECT  `ticket`.`id`,  
`ticket_history`.`ticket_id`,  
`ticket_history`.`id`,
`ticket_history`.`name`,  
`ticket_history`.`create_time` 
FROM  `ticket_history` 
INNER JOIN  `ticket` ON  `ticket_history`.`ticket_id` =  `ticket`.`id` 
WHERE  `ticket_history`.`name` LIKE  '%Raw% %new%'
OR  `ticket_history`.`name` LIKE  '%Close'
ORDER BY  `ticket_history`.`ticket_id`, `ticket_history`.`id` ASC 

I get the following output:

+----+-----------+-----+-------------------------------------------+---------------------+
| id | ticket_id | id  |                   name                    |     create_time     |
+----+-----------+-----+-------------------------------------------+---------------------+
|  1 |         1 |  79 | %%Close                                   | 2013-06-10 11:50:33 |
|  2 |         2 |   2 | %%2013060810000011%%Raw%%3 normal%%new%%2 | 2013-06-08 21:59:02 |
|  3 |         3 |   5 | %%2013060810000021%%Raw%%3 normal%%new%%3 | 2013-06-08 21:59:03 |
|  3 |         3 |  22 | %%Close                                   | 2013-06-08 22:10:41 |
|  3 |         3 |  82 | %%Close                                   | 2013-06-10 11:50:49 |
|  4 |         4 |  88 | %%Close                                   | 2013-06-10 11:51:32 |
|  5 |         5 |  64 | %%2013060910000019%%Raw%%3 normal%%new%%5 | 2013-06-09 17:12:09 |
|  5 |         5 |  85 | %%Close                                   | 2013-06-10 11:51:10 |
|  6 |         6 |  92 | %%2013061010000016%%Raw%%3 normal%%new%%6 | 2013-06-10 12:00:24 |
|  7 |         7 |  95 | %%2013061010000025%%Raw%%3 normal%%new%%7 | 2013-06-10 13:05:05 |
|  8 |         8 |  98 | %%2013061110000014%%Raw%%3 normal%%new%%8 | 2013-06-11 19:05:06 |
|  8 |         8 | 109 | %%Close                                   | 2013-06-17 23:57:35 |
|  9 |         9 | 163 | %%2013061810000011%%Raw%%3 normal%%new%%9 | 2013-06-18 02:05:06 |
+----+-----------+-----+-------------------------------------------+---------------------+

I need to modify the above query so I can only list rows of each ticket_id which has "%RAW% %new%" in name and at the same time same ticket_id has a row which has "%Close" in it.

In other words, Three rows of ticket_id 3, Two rows of ticket_id 5 and Two Rows of ticket_id 8 should be displayed from the above output.

Best Answer

This isn't going to be a query that scales very well, no matter which approach you take, since text matching anywhere in a column except at the left edge of an indexed column will require a full table scan.

I'm not familiar with OTRS ... I downloaded it and looked at the source scripts that create the database schema and it looks like the column you're querying is the result of denormalization and they're using the %% symbols to do language localization... so querying the way you're doing it is not likely to be, as some people like to say, very "performant."

Bear in mind also that % is a wildcard character for the MySQL LIKE operator, so what you're actually matching on, as written, could lead to false positives, because your expression is evaluated by MySQL as trying to match *raw*[space]*new* and *close.

What you appear to be actually searching for is %\%Raw\%% %\%new\%% where % means anything and \% means a literal %, not a wildcard. Also, you want %\%Close, or possibly %\%Close% if the '%Close' can be found anywhere other than at the end of a line.

Importantly, I suspect there is a better way to get the data you are looking for from OTRS, so you should review the schema and consider whether this is a possibility. For example, if you are looking for tickets that have been closed, it looks like to me that this might be tickets WHERE ticket.ticket_state_id = 2 or possibly tickets WHERE ticket_history.state_id = 2? Similarly, wouldn't a ticket with ticket_history.queue_id = 2 mean "Raw"?

Those literal "2" values may not be correct, depending on your version and precisely how OTRS uses its database, but understanding the schema and querying the database in a more optimizer-friendly fashion will get you a substantially better-performing query than searching every row of the table with LIKE.

As your query is written, and also in the example below, every row of the ticket_history table will have to be scanned at least once, no matter what you do.

So I would not begin to suggest that the following is "the right way to do it" but it adds in the logic your query is missing. If a row contains one pattern, we verify whether the ticket_history contains a row for the same ticket that matches the other pattern, and if so, we keep that row ... and if not, we discard it.

SELECT `ticket`.`id`,  
       `th`.`ticket_id`,  
       `th`.`id`,
       `th`.`name`,  
       `th`.`create_time` 
  FROM `ticket_history` th
  JOIN `ticket` ON  `th`.`ticket_id` =  `ticket`.`id` 
 WHERE (
        `th`.`name` LIKE  '%\%Raw\%% %\%new\%%'
        AND EXISTS (SELECT * FROM ticket_history th1 WHERE th1.ticket_id = th.id AND th1.name LIKE '%\%Close')
       )
    OR (
        `th`.`name` LIKE  '%\%Close'
        AND EXISTS (SELECT * FROM ticket_history th2 WHERE th2.ticket_id = th.id AND th2.name LIKE '%\%Raw\%% %\%new\%%')
       )
 ORDER BY `ticket_history`.`ticket_id`, `ticket_history`.`id` ASC;