Mysql: Count, Group By, and limiting to first instance of an ID

MySQL

I am trying to write a query to generate a report of sorts and I'm stuck on the logic of it. The summary of it is:

I need to select the count of activities, where post_type is 'invoice' AND event = 'new', grouping by user_id that occurred between two dates based on the ts column. However, I need to make sure that when I'm counting, I only count the FIRST instance of each contact_id (Which comes from the invoices table, which is joined on activities.post_id = invoices.id).

In layman's terms: I need to figure out how many new invoices were created by each user_id in a given date-range. But I need to only count the FIRST instance of each contact_id.

So, given the following data:

activities

+----+---------+------------+-----------+-------+---------------------+
| id | user_id | post_id    | post_type | event |         ts          |
+----+---------+------------+-----------+-------+---------------------+
|  1 |     190 |       1000 | invoice   | new   | 2018-06-19 14:01:18 |
|  2 |     190 |       1001 | invoice   | new   | 2018-06-19 14:01:19 |
|  3 |     190 |       1002 | invoice   | new   | 2018-06-19 14:01:20 |
|  4 |     230 |       1003 | invoice   | new   | 2018-06-19 14:01:21 |
|  5 |     230 |       1004 | invoice   | new   | 2018-06-19 14:01:22 |
|  6 |     230 |       1005 | invoice   | new   | 2018-06-19 14:01:23 |
|  7 |     190 |       1006 | invoice   | new   | 2018-06-20 14:01:24 |
|  8 |     230 |       1007 | invoice   | new   | 2018-06-20 14:01:25 |
|  9 |     190 |       1008 | invoice   | new   | 2018-06-20 14:01:26 |
| 10 |     230 |       1009 | invoice   | new   | 2018-06-20 14:01:27 |
+----+---------+------------+-----------+-------+---------------------+

invoices

+------+------------+
|  id  | contact_id |
+------+------------+
| 1000 |        500 |
| 1001 |        600 |
| 1002 |        700 |
| 1003 |        800 |
| 1004 |        500 |
| 1005 |        500 |
| 1006 |        500 |
| 1007 |        600 |
| 1008 |        700 |
| 1009 |        800 |
+------+------------+

If I ran the query to find the count between 2018-06-19 and 2018-06-20, I should get the following output:

+---------+-------+
| user_id | count |
+---------+-------+
|     190 |     3 |
|     230 |     1 |
+---------+-------+

user_id 190 should get counted for id: 1, 2, 3 because id 7 and 9 have duplicate contact_ids.

user_id 230 should get counted for id 4 because all of his other rows were counted for user_id 190 already because they were the first instances of this contact_id.

I have an SQL fiddle of the data at https://www.db-fiddle.com/f/9smEW6TwA6UUk7Fh1anAN6/1

Best Answer

It takes 2 steps:

SELECT  user_id, COUNT(DISTINCT contact_id) AS 'count'
    FROM  
    (
        SELECT  i.contact_id, MIN(a.user_id) AS user_id
            FROM  activities AS a
            JOIN  invoices AS i  ON i.id = a.post_id
            GROUP BY  i.contact_id 
    ) x
    GROUP BY  x.user_id;

(Possibly the DISTINCT can be removed.)