Sql-server – Max date record for items for given set of dates

sql serversql-server-2012

I am looking for some help to create an efficient SQL query (SQL Server 2012) that will identify the latest records (date) for all the items in the table (table 2 below) on each of a list of given dates (table 1 below). Here are the tables and sample data set. The tables are rather huge (600 dates, 5K+ data entries each for 6000 items).

table 1: qdate:

QUERY_DATES
03/03/2015   
03/04/2015  
04/09/2015  
06/12/2015  
05/05/2016  

table 2: svalue:

ITEM  DATA_ENTRY_DATE     VALUE 
1     03/02/2014          24  
1     03/02/2015          22  
1     04/04/2015          100  
1     03/09/2016          102  
2     03/02/2015          -25  

Required SQL Result:

ITEM QUERY_DATES  VALUE   DATA_ENTRY_DATE  
1    03/03/2015    22     03/02/2015   
1    03/04/2015    22     03/02/2015   
1    04/09/2015   100     04/04/2015  
1    06/12/2015   100     04/04/2015  
1    05/05/2016   102     03/09/2016  
2    03/03/2015   -25     03/02/2015   
2    03/04/2015   -25     03/02/2015   
2    04/09/2015   -25     03/02/2015  
2    06/12/2015   -25     03/02/2015  
2    05/05/2016   -25     03/02/2016  

Best Answer

I think this is what you want:

SELECT 
    i.item,
    q.query_dates,
    s.value,
    s.data_entry_date
FROM
    qdate AS q
  CROSS JOIN
    ( SELECT DISTINCT item
      FROM svalue
    ) AS i
  OUTER APPLY
    ( SELECT TOP (1) s.value, s.data_entry_date
      FROM svalue AS s
      WHERE s.item = i.item
        AND s.data_entry_date <= q.query_dates
      ORDER BY s.data_entry_date DESC
   ) AS s 
ORDER BY 
    item, 
    query_dates ;

Now about efficiency, I doubt there is way to make this blazing fast with a big table. Even with the rather small tables (600 dates, 6K items), you still want a result set of 3.6M rows.

An index on (item, data_entry_date) INCLUDE (value) will probably help a lot but still, it's 3 million rows of results. Are you going to display them? Or save them in another table for further analysis?

If there is another table with items, it would also be probably better to replace the i derived table with that table.