I want to assign a value to a new column based on a condition for other rows associated with a user_id
.
A user can see the same step with different devices at different times. However, only the first step must be taken into consideration.
e.g. step2
was seen by both mobile
and desktop
, but was first seen by mobile
at 2021-03-16 14:03:16
.
Likewise step4
was seen by both desktop
and tablet
, but first desktop
.
To sum up, each step first seen by
step1 = mobile
step2 = mobile
step3 = mobile
step4 = desktop
Since the only device change happened at step4
, then assign desktop > mobile to all records for that user_id
.
How can I get the first step per user and device as in the expected result below?
Sample data:
+---------+-------+---------------------+---------+
| user_id | step | created_at | device |
+---------+-------+---------------------+---------+
| user1 | step1 | 2021-03-16 14:03:16 | mobile |
| user1 | step2 | 2021-03-16 14:04:07 | mobile |
| user1 | step2 | 2021-03-16 14:03:47 | desktop |
| user1 | step3 | 2021-03-16 14:03:55 | mobile |
| user1 | step3 | 2021-03-16 14:04:00 | mobile |
| user1 | step1 | 2021-03-16 14:04:02 | desktop |
| user1 | step2 | 2021-03-16 14:03:16 | mobile |
| user1 | step3 | 2021-03-16 14:04:07 | mobile |
| user1 | step4 | 2021-03-16 14:04:08 | desktop |
| user1 | step4 | 2021-03-16 14:04:09 | tablet |
+---------+-------+---------------------+---------+
The expected result:
+---------+-------+---------------------+---------+---------------------+
| user_id | step | created_at | device | device_concatenated |
+---------+-------+---------------------+---------+---------------------+
| user1 | step1 | 2021-03-16 14:03:16 | mobile | mobile > desktop |
| user1 | step2 | 2021-03-16 14:03:16 | mobile | mobile > desktop |
| user1 | step3 | 2021-03-16 14:03:55 | mobile | mobile > desktop |
| user1 | step4 | 2021-03-16 14:04:08 | desktop | mobile > desktop |
+---------+-------+---------------------+---------+---------------------+
Best Answer
Given the data as is (as of 11:38 UTC, 2021-04-08), this might help (see PostgreSQL fiddle here):
Populate it (slightly rearranged from original data to put it in order - user_id, step & created_at - makes it easier to figure out what's going on):
I'll show the steps that led me to the solution - might be helpful (remove surplus field as appropriate for your requirements):
Result (better viewed on fiddle):
Then, we use the above result as a sub-query as follows:
Result:
Which is pretty close to the desired result. To obtain the desired result, we need to know how/why for step3,
mobile > desktop
is valid andmobile > mobile
is not? Same goes for step4 - why isdesktop > tablet
not correct? There isn't even a single entry for mobile for step4... On receiving an explanation, I will edit as appropriate!