I'm pretty new to SQL and have been trying to solve this task for a while…still no luck. I would appreciate if someone here could help me out.
I have a database with columns:
- ClientID
- VisitID
- Date
- PurchaseID (array)
- etc.
What I'm trying to achieve is to retrieve a list containing the following data:
- ClientID
- Last Visit Date
- First Visit Date
- Last Purchase Date
- Visits Count
- Purchases Count
- Visits After Last Purchase Count
When trying to retrieve a value for Visits After Last Purchase Count
this is where I am stuck.
SELECT
ClientID,
FirstVisit,
LastVisit,
LastPurchaseDate,
Visits,
Purchases,
VisitsAfterPurchase
FROM
(
SELECT
h.ClientID,
max(h.Date) AS LastVisit,
min(h.Date) AS FirstVisit,
count(VisitID) AS Visits
FROM s7_visits AS h
WHERE Date > '2017-12-01'
GROUP BY h.ClientID
LIMIT 100
)
ANY LEFT JOIN
(
SELECT
d.ClientID,
max(d.Date) AS LastPurchaseDate,
sum(length(d.PurchaseID)) AS Purchases,
sum(
(
SELECT count(x.VisitID)
FROM s7_visits AS x
WHERE x.ClientID = d.ClientID
HAVING x.Date >= max(d.Date)
)) AS VisitsAfterPurchase
FROM s7_visits AS d
WHERE (length(PurchaseID) > 0) AND (Date > '2017-12-01')
GROUP BY d.ClientID
) USING (ClientID)
The database system I'm using is Yandex ClickHouse.
The USING
syntax is absolutely normal for ClickHouse. It is used instead of ON
clause in other RDBMSs.
This query is giving me the following error:
DB::Exception: Column Date is not under aggregate function and not in GROUP BY..
Sample Data:
+----------+---------+------------+------------+
| CliendID | VisitID | Date | PurchaseID |
+----------+---------+------------+------------+
| 123 | 136 | 01.12.2017 | |
| 123 | 522 | 05.12.2017 | |
| 123 | 883 | 08.12.2017 | |
| 123 | 293 | 09.12.2017 | ['345'] |
| 123 | 278 | 12.12.2017 | |
| 123 | 508 | 12.12.2017 | |
| 123 | 562 | 15.12.2017 | |
| 123 | 523 | 21.12.2017 | |
| 456 | 736 | 29.11.2017 | |
| 456 | 417 | 03.12.2017 | |
| 456 | 950 | 04.12.2017 | |
| 456 | 532 | 05.12.2017 | ['346'] |
| 456 | 880 | 09.12.2017 | |
| 456 | 296 | 12.12.2017 | |
| 456 | 614 | 15.12.2017 | |
+----------+---------+------------+------------+
And the result should be:
+----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
| ClientID | Last Visit Date | First Visit Date | Last Purchase Date | Visits Count | Purchases Count | Visits After Last Purchase Count |
+----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
| 123 | 21.12.2017 | 01.12.2017 | 09.12.2017 | 8 | 1 | 4 |
| 456 | 15.12.2017 | 29.11.2017 | 05.12.2017 | 7 | 1 | 3 |
+----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
Best Answer
Problem solved. Unfortunately ClickHouse does not support aliases for subqueries and hence referencing. So I had to find another workaround using SELECT from subquery (with LastVisitDate) LEFT JOINed with another subquery (with LastPurchase Date) containing and then using WHERE clause "LastVisitDate < LastPurchaseDate".