SQL Server Performance – Alternative to JOIN with BETWEEN

performanceperformance-tuningquery-performancesql serversql-server-2012

I have two tables to JOIN based on BETWEEN condition.

Table 1 is small is small around 1500 records and Table 2 is of 40 millions records. Table1 is having only one column with the datatype bigint and Table2 with 8 columns. I need to do join between these two tables on BETWEEN condition.

I tried following but its getting slow for just 1 record in Table1 and 40 millions in Table2.

Query:

SELECT t1.cola AS [InputValue],t2.cola,t2.colb,t2.colc,t2.cold,t2.code
FROM table2 t2 
INNER JOIN table1 t1 ON t1.cola BETWEEN t2.cola AND t2.colb ;

Indexing:

CREATE NONCLUSTERED INDEX NCIX_Table1_Cola ON table1(cola)
CREATE NONCLUSTERED INDEX NCIX_Table2_Col_a_b ON table2(cola,colb)

Above query took 30 seconds for just 1 record in table1 and 40 millions in table2. As said I will be getting more than 1500 records in table1 will getting more slower. Any alternative of between or proper indexing need to be done?

Edit: Added sample data.

Table1:

cola
---------------
12
145
34
90
88990
987611
55
...
..
......1500 rows

Table2:

cola    colb    colc    cold    cole
-------------------------------------
0       10      c1      d1      e1
11      20      c2      d2      e2
21      40      c3      d3      e3
41      60      c4      d4      e4
61      100     c5      d5      e5
101     1000    c6      d6      e6
1001    10000   c7      d7      e7
10001   200000  c8      d8      e8
...... 
......40 millions records

Expected result:

InputValue  cola    colb    colc    cold    cole
--------------------------------------------------
12          11      20      c2      d2      e2
145         101     1000    c6      d6      e6
34          21      40      c3      d3      e3
.....

SQL Fiddle : Link

Best Answer

I've faced similar issue - the thing is sql doesn't "know" that cola and colb is range and cola of next row will always be bigger than current row colb, so between won't help too much: when it will find first cola that matches between, it will continue checking others also. I would suggest query that uses cross apply to find max(table2.cola) <= table1.cola and then to verify add where table2.colb >= table1.cola Something like this

SELECT      t1.cola AS [InputValue]
           ,t2.cola
           ,t2.colb
           ,t2.colc
           ,t2.cold
           ,t2.code
FROM        table1 t1
CROSS APPLY (   SELECT   TOP 1
                         t2.cola
                        ,t2.colb
                        ,t2.colc
                        ,t2.cold
                        ,t2.code
                FROM     table2 t2
                WHERE    t1.cola >= t2.cola
                         AND t1.cola <= t2.colb
                ORDER BY t2.cola DESC ) t2;

Related Solutions

MySQL – Which Join is Better: Left Outer Join or Inner Join?

There is not a "better" or a "worse" join type. They have different meaning and they must be used depending on it.

In your case, you probably do not have employees with no work_log (no rows in that table), so LEFT JOIN and JOIN will be equivalent in results. However, if you had such a thing (a new employee with no registered work_log), a JOIN wold omit that employee, while a left join (whose first table is employees) would show all of them, and nulls on the fields from work_log if there are not matches.

Visual explanation of JOIN types
Image by C.L. Moffatt on Code Project

Again, performance is a secondary thing to query correctness. Some people say that you shouldn't use LEFT JOINs. It is true that a LEFT JOIN forces the optimizer to execute the query in one particular order, preventing some optimizations (table reordering) in some cases. Here is one example. But you should not choose one over the other if correctness/meaning is sacrificed, as an INNER JOIN is not inherently worse. The rest of the usual optimizations apply as usual.

In summary, do not use LEFT JOIN if you really mean INNER JOIN.

In MySQL CROSS JOIN, INNER JOIN and JOIN are the same. In the standard, and semantically, a CROSS JOIN is an INNER JOIN without an ON clause, so you get every combination of rows between tables.

You have examples of all semantic types of join on Wikipedia. In practice, in MySQL, we tend to only write JOIN and LEFT JOIN.

SQL Server – Pagination Performance with Subquery, Inner Join, and Where

I don't think you'll be able to get good performance while using OFFSET. The database must search through 1,000,025 rows of output from the inner query; even if you have a good clustered index on TaskResults the system doesn't know for certain that it can skip ahead to date X.

But you do! Assuming this is for some kind of GUI, make a note of the earliest StatusDate from the previous query, then use it to fitler next page:

SELECT
    tr.Id, StatusDate
FROM
    (
    SELECT tr.Id, tr.StatusDate
    FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
        INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
    WHERE tr.ClientId = 4 AND dev.Deleted = 0
        AND
            (
            -- Retrieve only records from before the previous page
            tr.StatusDate < @PrevStatusDate
            OR (tr.StatusDate = @PrevStatusDate AND tr.Id < @PrevID) 
            )
    ) AS tr        
ORDER BY StatusDate, Id DESC
FETCH NEXT 25 ROWS ONLY

So if page #123 ends with 2015/05/01, record #234, you want to consider all records that are from 2015/04/30 or earlier, or which are also from 2015/05/01 but are for records #1 .. #233.

This should work well with your more complex UNION query, but "real" partitioning would probably be easier than this roll-yer-own partitioning..

If StatusDate is unique, or it's acceptable to occasionally show the same record on two adjacent pages, you can drop the @PrevID and ORDER BY Id bits. If Id is always-increasing, you can filter off of it and skip StatusDate.

Keep in mind that retrieving pages like this can easily skip a record or include the same record twice if records are being adding, removed, or reordered in the underlying data. But that's another topic.

Best Answer

Related Solutions

MySQL – Which Join is Better: Left Outer Join or Inner Join?

SQL Server – Pagination Performance with Subquery, Inner Join, and Where

Related Question