SQLite – Select Row ID When All Column Values Are the Same

selectsqlite

I have a table where more than one rows with same ttid. I would like to get ttid rows based on status column is done.

tb  tbid    ttid    status
-----------------------------
216 6   5   done
216 6   5   running
216 7   5   preparing
217 11  10  done
217 11  10  done
218 12  11  done
218 12  11  running
212 15  14  done
212 15  14  running
220 17  15  done
220 17  15  done

I wanted to have the following rows only:

tb      tbid    ttid    status
--------------------------------
217     11      10      done
220     17      15      done

I have tried the following query but it gives wrong info for tb: 216, because two different tbid.

select tb,tbid,tbid,status FROM tt 
status in ('done')

Could you please help how to get the expected results.

Best Answer

Edit

I should not get the result of ttid 5 because it has status as 'preparing'. I should get ttids results only when all the statuses are 'done'.

To get only the ones where done is found twice and no other statuses are found:

SELECT tb,  tbid, ttid,status 
FROM  test t1
WHERE status = 'done'
AND NOT EXISTS(SELECT * FROM test t2 where t2.status != 'done' and t1.ttid = t2.ttid)
GROUP BY  tb,tbid,ttid,status
HAVING COUNT(*) > 1;

DB-Fiddle

SELECT ttid,status 
FROM test t1
WHERE status = 'done'  
AND NOT EXISTS(SELECT * FROM test t2 where t2.status != 'done' and t1.ttid = t2.ttid)
GROUP BY  ttid,status
HAVING COUNT(*) > 1;

DB-Fiddle

The reasoning for adding it to the group by or applying a function:

if tb or tbid can be different for the same 'done' status and the same ttid, which one do you choose? Do you choose the smallest, the biggest, ...?

An example of what happens if this is true and it is not added to a group by or function:

DB-Fiddle that fails

Related Solutions

MySQL Select Items Where Single Column Contains Multiple Values

This is the relational division problem and there is a question about it at SO, with a lot of ways to write this query, plus performance analysis for PostgreSQL: How to filter SQL results in a has-many-through relation

Shamelessly copying code form there and removing/changing code for answers that have features lacking from MySQL, like CTEs, EXCEPT, INTERSECT, etc, here are a few ways to do this.

Assumptions:

the table is called factors
there is a UNIQUE constraint on (wordid, docid)
there is a documents and a words table:

Easy to write, medium efficiency:

-- Query 1 -- by Martin
SELECT d.docid, d.docname
FROM   document d
JOIN   factors f USING (docid)
WHERE  f.wordid IN (2, 4, 5)
GROUP  BY d.docid
HAVING COUNT(*) = 3 ;           -- number of words

Easy to write, medium efficiency:

-- Query 2 -- by Erwin
SELECT d.docid, d.docname
FROM   documents d
JOIN   (
   SELECT docid
   FROM   factors
   WHERE  wordid IN (2, 4, 5)
   GROUP  BY docid
   HAVING COUNT(*) = 3
   ) f USING (docid) ;

More complex to write, very good efficiency in Postgres - probably lousy in MySQL:

-- Query 4 -- by Derek
SELECT d.docid, d.docname
FROM   documents d
WHERE  d.docid IN (SELECT docid FROM factors WHERE wordid = 2)
AND    d.docid IN (SELECT docid FROM factors WHERE wordid = 4);
AND    d.docid IN (SELECT docid FROM factors WHERE wordid = 5);

More complex to write, very good efficiency in Postgres - and probably the same in MySQL:

-- Query 5 -- by Erwin
SELECT d.docid, d.docname
FROM   documents d
WHERE  EXISTS (SELECT * FROM factors 
               WHERE  docid = d.docid AND wordid = 2)
AND    EXISTS (SELECT * FROM factors 
               WHERE  docid = d.docid AND wordid = 4)
AND    EXISTS (SELECT * FROM factors 
               WHERE  docid = d.docid AND wordid = 5) ;

More complex to write, very good efficiency in Postgres - and probably the same in MySQL:

-- Query 6 -- by Sean
SELECT d.docid, d.docname
FROM   documents d
JOIN   factors x ON d.docid = x.docid
JOIN   factors y ON d.docid = y.docid
JOIN   factors z ON d.docid = z.docid
WHERE  x.wordid = 2
AND    y.wordid = 4
AND    z.wordid = 5 ;

Easy to write and extend to an arbitrary set of words but not as efficient as the JOIN and EXISTS solutions:

-- Query 7 -- by ypercube
SELECT d.docid, d.docname
FROM   documents d
WHERE  NOT EXISTS (
   SELECT *
   FROM   words AS w 
   WHERE  w.wordid IN (2, 4, 5)
   AND    NOT EXISTS (
      SELECT *
      FROM   factors AS f 
      WHERE  f.docid = d.docid 
      AND    f.wordid = w.wordid 
      )
   );

Easy to write, not good efficiency:

-- Query 8 -- by ypercube
SELECT d.docid, d.docname
FROM   documents d
WHERE  NOT EXISTS (
   SELECT *
   FROM  (
      SELECT 2 AS wordid UNION  ALL
      SELECT 4 UNION ALL
      SELECT 5
      ) AS w
   WHERE NOT EXISTS (
      SELECT *
      FROM   factors AS f 
      WHERE  f.docid = d.docid 
      AND    f.wordid = w.wordid 
      )
   );

Enjoy testing them :)

PostgreSQL – Select Unique IDs Where Second Column is Unique

In Postgres, this is simpler with DISTINCT ON:

SELECT *
FROM (
   SELECT DISTINCT ON (sec)
          id, sec
   FROM   tbl
   ORDER  BY sec, id DESC
   ) sub
ORDER  BY id DESC
LIMIT  4;

Detailed explanation in this related answer on SO:

Select first row in each GROUP BY group?

For a big table and small LIMIT, neither this nor @a_horse's solution are very efficient. The subquery will plough through the whole table, wasting a lot of time ...

Recursive CTE

I have tried and failed to solve similar problems with a recursive CTE in the past and resorted to a procedural solution with PL/pgSQL. Example:

Self-join unique pairs

Finally, here is a working rCTE:

WITH RECURSIVE cte AS (
   (  -- parentheses required
   SELECT id, '{}'::int[] AS last_arr, ARRAY[sec] AS arr
   FROM   tbl
   ORDER  BY id DESC
   LIMIT  1
   )
   UNION ALL
   (
   SELECT b.id, c.arr
        , CASE WHEN b.sec = ANY (c.arr) THEN c.arr ELSE b.sec  || c.arr END
   FROM   cte c
   JOIN   tbl b ON b.id < c.id
   WHERE  array_length(c.arr, 1) < 4
   ORDER  BY id DESC
   LIMIT  1
   )
   )
SELECT id, arr[1] AS sec
FROM   cte
WHERE  last_arr <> arr;

It's not as fast or elegant as I had hoped for and not nearly as fast as the function below, but faster than the query above in my tests.

PL/pgSQL function

Fastest by far:

CREATE OR REPLACE FUNCTION f_first_uniq(_rows int)
   RETURNS TABLE (id int, sec int) AS
$func$
DECLARE
   _arr int[];
BEGIN
   FOR id, sec IN
      SELECT t.id, t.sec FROM tbl t ORDER BY t.id DESC
   LOOP
      IF sec = ANY (_arr) THEN
         -- do nothing
      ELSE
         RETURN NEXT;
         _arr := _arr || sec;
         EXIT WHEN array_length(_arr, 1) >= _rows;
      END IF;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_first_uniq(4);

SQL Fiddle demonstrating all three.

Could be made out to work for any table with table and column names as parameters and dynamic SQL with EXECUTE ...

Why bother?

In a test table with only 30k rows the function ran 2000x faster than the above query (which already ran ~ 30% faster than a_horse's version). This difference grows with the size of the table. Performance of the function is about constant, while the query's performance gets progressively worse, since it tries to find distinct values in all of the table first. Try this in a table with a million rows ...