I have a table that records all the changes done to a model by a user. The table looks something like this: (SQLFiddle)
In the table an action_hash
is generated based on the model
, model_id
and timestamp
, which effectively generates a unique identifier of an action.
I want to to display the data from this table to my users as an "activity feed". At first I was querying the database like so:
-- Get the last 5 actions (for pagination)
SELECT `action_hash` from `changes` GROUP BY `action_hash` ORDER BY `timestamp` DESC LIMIT 5;
-- Get all the rows for this last 5 action
SELECT * FROM `changes` WHERE `action_hash` IN ('a1a2913ac13c34afd957c1646243d8fa', 'af024d2e0b56c8af6506e8f755bb1e39', '33ccb3d33e4add95486a58dd5b33495a', 'b2fcb43f711ea6e047b4610334e546c8', '85f21b48085cce9ff7bab9c0b26dfb8f')
I would get a result like above. Then my backend code would go through the results in the last query and group the rows by hash, and parse them to be human readable while still maintaining the pagination.
The problem with this method is that some rows are so similar showing them as separate changes creates too much noise in the activity feed. The similar rows should somehow be grouped together.
The rows highlighted above should be grouped together uniquely identified in my final result (maybe with another hash?) because the changes occurring are happening to the same model (and model_id) within 15 minutes of each other. How can I write a query that shows this?
This is the result I'm looking for:
Note: action_group
doesn't have to be integers, any string that indicate they associate with one another will do. I'd also would like to paginate my results based on the new action_group
column.
Best Answer
I would start with a query that finds the latest row in each group of similar rows that are less than 15 minutes apart, and picks the latest five of those:
To get a unique number for each group, you can use a variable:
Using this query as a derived table (subquery in FROM clause), you can find all rows that belong to these groups:
The full query then becomes:
If your changes table is very big, the above query might take some time. Indexes on (model, model_id) and timestamp might help. Also, it might be possible to rewrite the query to use JOIN instead of NOT IN.