Mysql – Correlated subqueries. Count Visits after Last Purchase Date

clickhouseMySQL

I'm pretty new to SQL and have been trying to solve this task for a while…still no luck. I would appreciate if someone here could help me out.

I have a database with columns:

  • ClientID
  • VisitID
  • Date
  • PurchaseID (array)
  • etc.

What I'm trying to achieve is to retrieve a list containing the following data:

  • ClientID
  • Last Visit Date
  • First Visit Date
  • Last Purchase Date
  • Visits Count
  • Purchases Count
  • Visits After Last Purchase Count

When trying to retrieve a value for Visits After Last Purchase Count this is where I am stuck.

SELECT 
ClientID, 
FirstVisit, 
LastVisit, 
LastPurchaseDate, 
Visits, 
Purchases, 
VisitsAfterPurchase
FROM 
(
SELECT 
    h.ClientID, 
    max(h.Date) AS LastVisit, 
    min(h.Date) AS FirstVisit, 
    count(VisitID) AS Visits
FROM s7_visits AS h 
WHERE Date > '2017-12-01'
GROUP BY h.ClientID
LIMIT 100
) 
ANY LEFT JOIN 
(
SELECT 
    d.ClientID, 
    max(d.Date) AS LastPurchaseDate, 
    sum(length(d.PurchaseID)) AS Purchases, 
    sum(
    (
        SELECT count(x.VisitID)
        FROM s7_visits AS x 
        WHERE x.ClientID = d.ClientID
        HAVING x.Date >= max(d.Date)
    )) AS VisitsAfterPurchase
FROM s7_visits AS d 
WHERE (length(PurchaseID) > 0) AND (Date > '2017-12-01')
GROUP BY d.ClientID
) USING (ClientID)

The database system I'm using is Yandex ClickHouse.
The USING syntax is absolutely normal for ClickHouse. It is used instead of ON clause in other RDBMSs.

This query is giving me the following error:

DB::Exception: Column Date is not under aggregate function and not in GROUP BY..

Sample Data:

  +----------+---------+------------+------------+
  | CliendID | VisitID |    Date    | PurchaseID |
  +----------+---------+------------+------------+
  |      123 |     136 | 01.12.2017 |            |
  |      123 |     522 | 05.12.2017 |            |
  |      123 |     883 | 08.12.2017 |            |
  |      123 |     293 | 09.12.2017 | ['345']    |
  |      123 |     278 | 12.12.2017 |            |
  |      123 |     508 | 12.12.2017 |            |
  |      123 |     562 | 15.12.2017 |            |
  |      123 |     523 | 21.12.2017 |            |
  |      456 |     736 | 29.11.2017 |            |
  |      456 |     417 | 03.12.2017 |            |
  |      456 |     950 | 04.12.2017 |            |
  |      456 |     532 | 05.12.2017 | ['346']    |
  |      456 |     880 | 09.12.2017 |            |
  |      456 |     296 | 12.12.2017 |            |
  |      456 |     614 | 15.12.2017 |            |
  +----------+---------+------------+------------+

And the result should be:

  +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
  | ClientID | Last Visit Date | First Visit Date | Last Purchase Date | Visits Count | Purchases Count | Visits After Last Purchase Count |
  +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+
  |      123 |      21.12.2017 |       01.12.2017 |         09.12.2017 |            8 |               1 |                                4 |
  |      456 |      15.12.2017 |       29.11.2017 |         05.12.2017 |            7 |               1 |                                3 |
  +----------+-----------------+------------------+--------------------+--------------+-----------------+----------------------------------+

Best Answer

Problem solved. Unfortunately ClickHouse does not support aliases for subqueries and hence referencing. So I had to find another workaround using SELECT from subquery (with LastVisitDate) LEFT JOINed with another subquery (with LastPurchase Date) containing and then using WHERE clause "LastVisitDate < LastPurchaseDate".