SQLite Group By – Finding Opening and Closing Price

sqlite

I want to preface this post by saying that I am a SQL(ite) newbie. My main goal is to find the opening/closing/high/low/average prices of each 5 minute interval of these stock prices. I can get the high/low working with min/max and the group by query but I am running into problems with the opening/closing. My database looks something like this

SYMBOL | DATETIME           |PRICE
AAPL   | 1999-06-01 09:30:01|45.0
AAPL   | 1999-06-01 09:30:03|44.9375
AAPL   | 1999-06-01 09:30:04|44.9375
AAPL   | 1999-06-01 09:30:05|44.9375
AAPL   | 1999-06-01 09:30:06|44.9375
AAPL   | 1999-06-01 09:30:07|45.0
AAPL   | 1999-06-01 09:30:08|45.0
AAPL   | 1999-06-01 09:30:09|44.9375
AAPL   | 1999-06-01 09:30:11|45.0
AAPL   | 1999-06-01 09:30:12|44.96875

The following query

SELECT SYMBOL, MIN(DATETIME) AS MINDT, MAX(DATETIME) AS MAXDT, AVG(PRICE)
FROM stocks
WHERE DATETIME < '1999-06-02 00:00:00'
GROUP BY strftime('%s', DATETIME)/(60*5) LIMIT 4;

results in

AAPL|1999-06-01 09:30:01|1999-06-01 09:34:41|44.9269724822695
AAPL|1999-06-01 09:35:22|1999-06-01 09:39:58|44.8615196078431
AAPL|1999-06-01 09:40:03|1999-06-01 09:44:58|44.9800531914894
AAPL|1999-06-01 09:45:05|1999-06-01 09:49:52|44.9397321428571

whereas

SELECT SYMBOL, MIN(DATETIME) AS MINDT, MAX(DATETIME) AS MAXDT, AVG(PRICE),
(SELECT PRICE FROM stocks WHERE DATETIME=MINDT),
(SELECT PRICE FROM stocks WHERE DATETIME=MAXDT)
FROM stocks
WHERE DATETIME < '1999-06-02 00:00:00'
GROUP BY strftime('%s', DATETIME)/(60*5);

throws this error in SQL

Error: near "MINDT": syntax error

and in R

Error in sqliteSendQuery(con, statement, bind.data) : 
  error in statement: no such column: MINDT

I have googled around and I think I found what is wrong with my query: you can not acces an alias made in main query in a sub query (i.e. reference MINDT in the subquery). But my I lack SQL knowledge to find a fix to this. Is there anyone that could help me out here?

Best Answer

You first query is OK although not entirely correct.

  • You need to be careful with GROUP BY and not have any column in the SELECT list that is not also in the GROUP BY list. So, add the symbol in the group by list.
  • Next, the LIMIT needs an ORDER BY so you are 100% certain that you get always the same results. SQLite (and many other DBMS) may return the same results all the time but that "all the time" is not guaranteed. You never know when you'll get a different execution plan and weird result.
  • If you want more aggregate results (like minimum or maximum price) over the windows, you can add them in the list.

The query becomes:

SELECT symbol, 
       MIN(datetime) AS mindt, 
       MAX(datetime) AS maxdt, 
       AVG(price)    AS avg_price,
       MIN(price)    AS min_price,
       MAX(price)    AS max_price
FROM stocks
WHERE datetime < '1999-06-02 00:00:00'
GROUP BY symbol, strftime('%s', DATETIME)/(60*5) 
ORDER BY symbol, strftime('%s', DATETIME)/(60*5)
LIMIT 4 ;

Now to get also the first and the last price ("opening" and "closing" in finance terminology) in these 5-minutes windows, you'll either need window functions or correlated subqueries or to wrap this query in a derived table and then join again to the original table. I'll write the 3rd option.

It assumes there is a UNIQUE constraint on (symbol, datetime) or (even without a constraint) that you don't have 2 rows with same symbol and datetime:

SELECT 
    s.*,
    s1.price AS opening_price,
    s2.price AS closing_price
FROM
  ( SELECT symbol, 
           MIN(datetime) AS mindt, 
           MAX(datetime) AS maxdt, 
           AVG(price)    AS avg_price,
           MIN(price)    AS min_price,
           MAX(price)    AS max_price
    FROM stocks
    WHERE datetime < '1999-06-02 00:00:00'
    GROUP BY symbol, strftime('%s', DATETIME)/(60*5) 
    ORDER BY symbol, strftime('%s', DATETIME)/(60*5)
    LIMIT 4
  ) AS s
  LEFT JOIN
    stocks AS s1
      ON  s.symbol = s1.symbol
      AND s.mindt  = s1.datetime
  LEFT JOIN
    stocks AS s2
      ON  s.symbol = s2.symbol
      AND s.maxdt  = s2.datetime ;