Sql-server – UNION ALL or other ways to return first row of result-set

performancequery-performanceset-returning-functionssql server

I am writing a table valued function to geocode an address based on its Suburb, State and Postcode. I try to geocode the address using different methods, in order of decreasing accuracy:

  1. Exact match on unique suburb-postcode-state combination
  2. Exact match on unique suburb-postcode combination
  3. Exact match on unique suburb-state combination
  4. Exact match on unique postcode
  5. Approximate match by non-unique Postcode, where all Suburbs with this Postcode are within 5 km of one another.

(I am working with a geographical region where Suburb-Postcode-State relationships are all many-to-many. In other words, one Suburb can have multiple postcodes; one postcode can have multiple suburbs and may exist in different states.)

Following is an extract from the table-valued function:

ALTER FUNCTION [geocode].[tvfn_Customer_Suburb_From_Address]
(   
    @Suburb NVARCHAR(100),
    @State NVARCHAR(100),
    @Postcode NVARCHAR(100),
    @Country NVARCHAR(100)
)
RETURNS TABLE 
AS
RETURN 
(

    SELECT TOP 1 *
    FROM (

            -- Unique suburb-postcode-state combinations
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 4 -- Exact match by unique Postcode, Suburb and State
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
            INNER JOIN [geocode].[tGeocode_Methods] gm
                ON s.Geocode_DID = gm.Geocode_DID

            WHERE s.[Is_Active] = 1
            AND s.[Suburb] = @Suburb
            AND s.[State] = @State
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- -- Unique suburb-postcode combinations
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 3 -- Exact match by unique Postcode & Suburb
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1
                            AND Suburb = s.Suburb AND Postcode = s.Postcode
                            GROUP BY Postcode, Suburb
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Suburb] = @Suburb
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- Exact match by unique Suburb and State
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 6 -- Exact match by unique Suburb and State
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1 AND Is_PO_Box = 0 -- Exclude PO Boxes
                            AND Suburb = s.Suburb AND Postcode = s.Postcode
                            GROUP BY Suburb, Postcode
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Suburb] = @Suburb
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1


            UNION ALL


            -- Exact match by unique Postcode
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 2 -- Exact match by unique Postcode
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM geocode.tSuburbs_XX s
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE EXISTS (  SELECT *
                            FROM geocode.tSuburbs_XX
                            WHERE Is_Active = 1
                            AND Postcode = s.Postcode
                            GROUP BY Postcode
                            HAVING COUNT(*) = 1
                            )
            AND s.Is_Active = 1
            AND s.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1
            -- Perform this extra check to make sure we don't match a postcode in a wrong country
            AND (       @Country IN ('AAA', 'BBB', 'CCC')
                    OR  @State IN ('MMM', 'NNN', 'OOO', 'PPP')
                )


            UNION ALL


            -- Approximate match by non-unique Postcode, where all Suburbs with this Postcode are within 5 km of one another.
            SELECT   s.Suburb_DID
                    ,s.Suburb
                    ,s.State
                    ,s.Postcode
                    ,Geocode_DID = 5
                    ,s.Geocode_Latitude
                    ,s.Geocode_Longitude
            FROM [geocode].[tPostcode_Distances] pd
                INNER JOIN  geocode.tSuburbs_XX s
                    ON pd.Approx_Suburb_DID = s.Suburb_DID
                INNER JOIN [geocode].[tGeocode_Methods] gm
                    ON s.Geocode_DID = gm.Geocode_DID
            WHERE  s.Is_Active = 1
            AND pd.[Postcode] = @Postcode
            -- Only suburbs that are geocoded with methods that can be used for geocoding customers
            AND gm.[Can_Use_For_VIP] = 1
            -- Perform this extra check to make sure we don't match a postcode in a wrong country
            AND (       @Country IN ('AAA', 'BBB', 'CCC')
                    OR  @State IN ('MMM', 'NNN', 'OOO', 'PPP')
                )
            AND pd.Max_Distance <= 5000 -- within 5 km
    ) t
)

The above function works, but I would like to know if it can be improved. In particular, is it possible to force SQL Server to stop processing the SELECT statements after the first SELECT that returns a result-set (because we're interested in the first matching result only – TOP 1)?

Update

Thank you for suggestions thus far. I will add a [Priority] column as suggested in several answers and comments and an ORDER BY clause to ensure I get the top result.

I will also add a WITH SCHEMABINDING to the TVFN so SQL Server can parallelise the plan. While we're on this subject, the using a multi-statement table-valued function is a good idea (Thank you to Paul White), but multi-statement TVFN's always force a serial plan.

I will now try Lennart's answer, in which he suggested using a CTE.

Best Answer

If you must use a single query (as required by a single inline function), you can use one of the two options below (illustrated in my recent answer to Relating 2 tables with possible wildcards?):

Option 1

Use multiple APPLY clauses with a startup condition for each using an outer reference from a previous apply in the chain. The efficiency of this method depends on the presence of startup filters in the execution plan. Correct results are guaranteed, but plan shape is not.

Option 2

Add an extra column with a constant literal to each clause of the union e.g. [Priority] = 1 then add an ORDER BY [Priority] ASC at the TOP (1) scope. Efficient operation depends on the plan avoiding sorts.

On reflection, this is not what you want in this case, because one row from each option is required by the merge concatenation in the plan. Nevertheless, it is an option in more general situations (where the alternate inputs produce more than one row, and the first row at low cost).


In addition:

Option 3

Since you're only returning a single row, you could use a multi-statement table-valued function instead, with explicit logic to try each option in order (in separate queries), returning as soon as the first result is found. This is guaranteed to produce correct results efficiently.

Note

The current function is tecnhically nondeterministic; SQL Server could evaluate the union all in any order it chooses, potentially returning a lower-priority result before evaluating a higher priority one.