SELECT ndo.sku ParentSKU, visitsWhenSKUWasOnline.s
FROM
Temp.NumberOfDaysOnline ndo
JOIN
(
SELECT SUM(gasessiondata.sessions) as s
FROM gasessiondata
WHERE
gasessiondata.date >= ndo.FromDate
AND
gasessiondata.date <= ndo.ToDate
) as visitsWhenSKUWasOnline
ON 1=1
I can't use ndo.fromDate
and ndo.ToDate
inside of the subquery since hive does not support correlated sub queries. What could be an alternative for something like this?
Edit:
down vote
favorite
I have a table like this:
date val
2016-1-1 8
2016-2-1 10
2016-1-2 30
2016-1-3 30
Now, I have two dates coming in from a different table, let's call them `fromdate` and todate
If fromdate was 1, 2016-1-1
and Todate was 2016-1-2
, I need 8+30 = 38
as the final value
I need to check
SUM(val)
of all those values dates corresponding to which are between the fromdate and twodate.
Best Answer
The syntax of you query is not often called "correlated subquery". A correlated subquery would be like this:
Your query is similar to a
LATERAL
join (CROSS / OUTER APPLY
in SQL Server). This type of correlated joins has not been implemented in many DBMS.It's easy to rewrite though so the join is done first and then the
GROUP BY
. This simple query should be supported by most, if not all DBMS:Seems that Hive supports only equi-joins and cross joins. Moving the condition to the
WHERE
clause might solve this: