Sql-server – T-sql: Select most specific match from table

querysql servert-sql

Apologies for the horrible title. I couldn't come up with a better way to briefly describe the problem.

Here's the scenario:

Users can set up service instructions to be attached to a work order based on the order's client, loan type, bank, pricing region, and service. The service instruction is set up with the following parameters:

  1. Applies to all clients or a specific client
  2. Applies to all loan types or a specific loan type
  3. Applies to all banks or a specific bank
  4. Applies to all pricing regions or a specific pricing region
  5. Applies to a specific service

A value of 1 in a column in the service_instruction table described below equates to "all".

The table structure (the relevant parts anyway):

order (order_num INT PRIMARY KEY, client_num INT, loan_type INT, 
       bank_num INT, pricing_region INT)

work_order_line(work_order_line_num INT PRIMARY KEY, order_num INT, service_num INT,
                description TEXT)

service_instruction(instruction_num INT PRIMARY KEY, service_num INT,
                    service_description TEXT, client_num INT NULL,
                    bank_num INT NULL, region_num INT, loan_type_cd TINYINT) 

I need to write a script which will select the most specific instruction for a service, if one exists. For example:

The user has set up the following service instructions:

  1. (All clients), (All loan types), (all banks), (all pricing regions), (service A), Instructions: Do work A
  2. (Client 1), (All loan types), (All banks), (region 3), (service A), Instructions: Do work B

An order which comes in for Client 1, region 3, and service A should have "Do work B" appended.
An order which comes in for Client 2, region 3, and service A should have "Do work A" appended.
An order which comes in for service B should have nothing appended.
And so on.

Currently this is done in code, but circumstances necessitate scripting it. Is there a way to do this that's more elegant (and with better performance perhaps) than the current mess of IF..ELSE's and ISNULL's I'm working on?

EDIT:
Here's some sample data.

service_instruction:

instruction_num | service_num | instruction_desc | bank_num | region_num | loan_type_cd | client_num
       3               251    'Take the photos, yeah'  17        96            3             1
       4               251  'Bid for debris removal.'  1         471           1             7
       7               251  'Bid for debris removal.   1          3            1             1                                    
                                 Do not perform'    

[order]:

order_num | client_num | loan_type | bank_num | pricing_region
    1           3            1          1             3
    2           7            3          1            471
    3           2            3          17            96
    4           5            2          6             17

work_order_line:

work_order_line_num | order_num | service_num | description
         20              1             251         NULL
         21              2             251         NULL
         22              3             251         NULL
         26              4             251         NULL

The end result would be:

work_order_line:

work_order_line_num | order_num | service_num | description
         20              1             251    'Bid for debris removal. Do not perform.'
         21              2             251    'Bid for debris removal.'
         22              3             251    'Take the photos, yeah'
         26              4             251    'Bid for debris removal. Do not perform.'

A value of 1 for the client_num, bank_num, or loan_type_cd in service_instruction means it applies to all clients, banks, or loan types, respectively. To further complicate things, the pricing_region in service_instruction is dependent upon the client. For example, when the client_num is 1 ("all clients"), a region_num of 3 indicates "all regions." When the client_num is 3, region_num 39 is "all regions". I'm not really looking for someone to write out the script for me, more a pointer in the right conceptual direction, so that complication can probably be ignored for now.

Here's what I've been working on to solve this, edited somewhat for brevity. Yes, there's a cursor in it. Yes, it's going to be put in a production environment. No, I'm not happy about it. Which is why I came here!

--Table var. containing orders we need to update
DECLARE @order_numbers TABLE (order_num INT PRIMARY KEY
    , client_num SMALLINT NOT NULL
    , order_type_cd TINYINT NOT NULL
    , loan_type_cd TINYINT NOT NULL
    , bank_num TINYINT NOT NULL
    , pricing_region INT NOT NULL
    , service_num INT NOT NULL
    , service_instruction_num SMALLINT NULL);

--Table containing the match data
DECLARE @instruction_matches TABLE (instruction_num SMALLINT PRIMARY KEY
        , client_match TINYINT
    , loan_type_match BIT
    , bank_match BIT
    , region_match BIT
    , total_matches TINYINT);

DECLARE @best_instruction SMALLINT
    , @order_num INT
    , @client_num SMALLINT
    , @loan_type_cd TINYINT
    , @bank_num TINYINT
    , @region_num INT
    , @service_num INT;

--Get the orders to update
INSERT INTO @order_numbers (order_num
    , client_num
    , order_type_cd
    , loan_type_cd
    , bank_num
    , pricing_region
    , service_num)
SELECT o.order_num, oc.client_num, o.order_type_cd, oc.loan_type_cd, oc.bank_num, oc.pricing_region, wol.service_num 
FROM [order] o
    inner join order_context oc ON o.context_num = oc.context_num
    inner join work_order_line wol ON o.order_num = wol.order_num
WHERE o.status_cd = 1 AND o.exchange_status_cd = 2 AND o.received_from_hub_ind = 1
    AND o.viewed_dt IS NULL

DECLARE o_cursor CURSOR LOCAL
FOR SELECT order_num FROM @order_numbers;
OPEN o_cursor;
FETCH NEXT FROM o_cursor
INTO @order_num;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @service_num = (SELECT service_num FROM @order_numbers WHERE order_num = @order_num);
    SET @client_num = (SELECT client_num FROM @order_numbers WHERE order_num = @order_num);
    SET @loan_type_cd = (SELECT loan_type_cd FROM @order_numbers WHERE order_num = @order_num);
    SET @bank_num = (SELECT bank_num FROM @order_numbers WHERE order_num = @order_num);
    SET @region_num = (SELECT pricing_region FROM @order_numbers WHERE order_num = @order_num);

    INSERT INTO @instruction_matches (instruction_num, client_match, loan_type_match, bank_match, region_match)
    SELECT
        instruction_num
        , CASE 
            WHEN client_num = @client_num OR client_num = 1
                THEN 1
            ELSE 0
          END
        , CASE
            WHEN loan_type_cd = @loan_type_cd OR loan_type_cd = 1
                THEN 1
            ELSE 0
          END
        , CASE
            WHEN bank_num = @bank_num OR bank_num = 1
                THEN 1
            ELSE 0
          END
        , CASE -- region_num = 4 means INSPECTIONS ONLY. will need to be re-written for pres.
            WHEN region_num = @region_num
                OR (client_num != 1 AND region_num = (SELECT region_num FROM region 
                                                      WHERE client_num = @client_num
                                                        AND region_type_cd = 2 
                                                        AND order_item_type_cd = 2
                                                        AND region_id = 1)
                )
                THEN 2
            WHEN client_num = 1 AND region_num = 4
                THEN 1
            ELSE 0
          END
    FROM service_instruction
    WHERE service_num = @service_num;

    UPDATE @instruction_matches
    SET total_matches = (client_match + loan_type_match + bank_match + region_match);

    SET @best_instruction = (SELECT TOP(1) instruction_num FROM @instruction_matches ORDER BY total_matches DESC);

    UPDATE @order_numbers
    SET service_instruction_num = @best_instruction 
    WHERE order_num = @order_num;

    FETCH NEXT FROM o_cursor
    INTO @order_num;
END;

Best Answer

My first step would be to rank the service_instruction by specificity/generality - I don't think you have provided enough information to be sure what you are after but assuming that it is based on the number of 1s and ties are broken by: bank_num is more specific than loan_type_cd is more specific than client_num:

select *, row_number() over ( order by case when bank_num=1 then 1 else 0 end+
                                       case when loan_type_cd=1 then 1 else 0 end+
                                       case when client_num=1 then 1 else 0 end,
                                     case when bank_num=1 then 1 else 0 end,
                                     case when loan_type_cd=1 then 1 else 0 end,
                                     case when client_num=1 then 1 else 0 end ) as gen
from service_instruction;

produces:

instruction_num service_num instruction_desc    bank_num    region_num  loan_type_cd    client_num  gen
3   251 Take the photos, yeah   17  96  3   1   1
4   251 Bid for debris removal. 1   471 1   7   2
7   251 Bid for debris removal. Do not perform  1   3   1   1   3

This can then be joined to order and work_order_line and filtered for the row for each work_order_line_num with the lowest generality (gen), perhaps something like this:

with w as (
    select *, row_number() over ( order by case when bank_num=1 then 1 else 0 end+
                                           case when loan_type_cd=1 then 1 else 0 end+
                                           case when client_num=1 then 1 else 0 end,
                                         case when bank_num=1 then 1 else 0 end,
                                         case when loan_type_cd=1 then 1 else 0 end,
                                         case when client_num=1 then 1 else 0 end ) as gen
    from service_instruction )
select *
from( select l.*, instruction_desc, row_number() over ( partition by l.work_order_line_num, 
                                                                     l.order_num, 
                                                                     l.service_num
                                                        order by gen ) as gen_rank
      from work_order_line l join [order] o on(o.order_num=l.order_num)
           join w on( l.service_num=w.service_num 
                      and (o.bank_num=w.bank_num or w.bank_num=1) 
                      and (o.loan_type_cd=w.loan_type_cd or w.loan_type_cd=1)
                      and (o.client_num=w.client_num or w.client_num=1)) ) z
where gen_rank=1

which produces:

work_order_line_num order_num   service_num instruction_desc    gen_rank
20  1   251 Bid for debris removal. Do not perform  1
21  2   251 Bid for debris removal. 1
22  3   251 Take the photos, yeah   1
26  4   251 Bid for debris removal. Do not perform  1

notes:

  1. I've assumed you need to partition by work_order_line_num, order_num and service_num but maybe not all are necessary depending on the PK of work_order_line
  2. I've ignored the pricing_region complexity you mention but don't specify completely - hopefully you'll be able to work it into the query in a similar manner to the rest
  3. You'll probably have to fix the CTE with your rules for specificity/generality
  4. I tested on 2008R2 - YMMV if you are on an earlier version