Sql-server – How to efficiently handle querying for different combinations of optional parameters

functionsjoin;sql serverstored-procedureswhere

Consider the following Tables:

Table: Parts
    part_id INT PK
    part_number VARCHAR

Table: Jobs
    job_id INT PK
    job_number VARCHAR

Table: Job_Parts
    part_id INT FK PK
    job_id INT FK PK
    quantity INT

Running the following statement:

SELECT
    p.part_number AS [Part],
    j.job_number AS [Job],
    jp.quantity AS [Quantity]
FROM
    dbo.parts AS p
    LEFT OUTER JOIN
        dbo.job_parts AS jp
        ON jp.part_id = p.part_id
    Left OUTER JOIN
        dbo.jobs AS j
        ON j.job_id = jp.job_id

Will give a table that can give results like the following:

Part        |Job        |Quantity
Part1234     Job1        10
Part1234     Job2        5
Part1234     Job3        7

Now let's pretend I have that above statement inside of a Function:

CREATE FUNCTION JobParts(
    @PartNum VARCHAR = NULL,
    @JobNum VARCHAR = NULL
) RETURNS TABLE AS (
    --SELECT STATEMENT
)

Is there a way for me to modify the table join in the select statement such that:

If @PartNum or @JobNum have a value, it joins where part_number or job_number equal the provided value
If @PartNum or @JobNum are NULL, it will only return the last record put in for a part.

To give an example, if I enter @JobNum='Job2' I want the result I get to be:

Part        |Job        |Quantity
Part1234     Job2        5

Otherwise, if @JobNum is NULL is defaults to just showing the last entry in the table, so my result would be:

Part        |Job        |Quantity
Part1234     Job3        7

Best Answer

I highly recommend that you don't go down that path. You are mixing business logic with relational querying, and that's a sure recipe for many headaches later on, especially in terms of performance.

Keep in mind the rule of '1 query for 1 task', and the simpler and more granular, the better. You are trying to write 1 query, to handle different tasks.

Use a business control layer either on the client side, a dedicated business tier, or even in a stored procedure on the server, and execute different dedicated, simple SQL queries based on your logic.

For example, in your case you could write a procedure that will accept both parameters and execute the right query based on the values provided. The procedure has negligible overhead, and you get the huge advantage of having separate, optimized query plans created for every combination.

CREATE PROCEDURE SomeProc 
@PartNum VARCHAR,
@JobNum VARCHAR
AS
BEGIN
IF @PartNum IS NULL AND @JobNum IS NULL
-- Specific Query 1 (or execute sub-procedure) goes here
ELSE IF @PartNum IS NOT NULL AND @JobNum IS NOT NULL
-- Specific Query 2 (or execute sub-procedure) goes here
ELSE IF @PartNum IS NULL AND @JobNum IS NOT NULL
-- Specific Query 3 ... etc.
ELSE -- 
-- Generic query, not optimized
END

UPDATE: If you need this to work for a large number of parameters, I highly recommend that you read this article by Erland Sommarskog. It is the best article that I know of for this type of challenge, and will offer you several solutions, with thorough analysis of the pros and cons of each. It's long, it's not an easy read, but it will be worth every minute you spend on it.

What worked well for me in previous similar cases, is a hybrid approach. I start with a generic solution, you will find a few options in the article. Then, I set up a trace, or a log table to record actual usage stats by the users and let it run for some time. Although product and marketing will say that all combinations are equally important, you will find that in real life, most users tend to use only a few and repeat the same pattern. Then, take the top used combinations, and write specific queries just for those, and use the generic query for the few instances that a unique combination of parameters is used. I've added the line of code to the example. This way you get the best of both worlds - for most executions, you will get excellent performance, and still support all possible combinations.

BTW - An interesting side effect of this approach, is that over time users start to notice that it takes much longer when they use the non-standard combinations, and then they either complain about it, so you can write a specific query for that case and make them happy, but usually they tend to just use the common combinations more, that are way faster :-)

HTH

Related Solutions

Mysql – get column from too many tables in thesql

If all the tables use the MyISAM Storage Engine and have the same table structure, I have some good news for you.

You can create a single table that consumes no additional space except a .frm file and some mapping info. The key is to take advantage of the MERGE (MRG_MyISAM) Storage Engine.

Here is how you can do this:

CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4) INSERT_METHOD=LAST;

Using this method, you can query the 4 tables at the same time like this:

SELECT ReqF FROM XMerge WHERE EmpName='John';

Was that simple, or what ???

In your case, you have 75 tables. You would do this:

CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4,X5,X6,X7,X8,X9,
X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,
X20,X21,X22,X23,X24,X25,X26,X27,X28,X29,
X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,
X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,
X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,
X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,
X70,X71,X72,X73,X74,X75) INSERT_METHOD=LAST;
SELECT ReqF FROM XMerge WHERE EmpName='John';

The beauty of this is that creating a MERGE table takes milliseconds. Just make sure every table has an index on EmpName. Better to do 75 indexed lookups that 75 full table scans. If there is no index on EmpName, you need to do this:

ALTER TABLE X1 ADD UNIQUE KEY (EmpName);
ALTER TABLE X2 ADD UNIQUE KEY (EmpName);
.
.
.
ALTER TABLE X75 ADD UNIQUE KEY (EmpName);
CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4,X5,X6,X7,X8,X9,
X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,
X20,X21,X22,X23,X24,X25,X26,X27,X28,X29,
X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,
X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,
X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,
X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,
X70,X71,X72,X73,X74,X75) INSERT_METHOD=LAST;
SELECT ReqF FROM XMerge WHERE EmpName='John';

Give it a Try !!!

MySQL: LEFT OUTER JOIN within reason

First consider a query that computes which rows are actually relevant from tablethree. With the assumption that with "most recently entered result" you mean "most recent enddate" the following query would gather the appropriate rows:

SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid

Now you can build a join to retrieve not only sid, but all of the data of tablethree:

SELECT a.*
FROM tablethree a
INNER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) b
ON a.sid = b.sid AND a.enddate = b.enddate

This is the result set you actually want to "left join in". You have to insert this into your original query:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN (
  SELECT a.*
  FROM tablethree a
  INNER JOIN (
    SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
  ) b
  ON a.sid = b.sid AND a.enddate = b.enddate
) AS t3
  ON t3.sid = t2.sid
WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate

What should also work is the following:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN tablethree AS t3
  ON t3.sid = t2.sid
LEFT OUTER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) mostrecent
  ON t3.sid = mostrecent.sid AND t3.enddate = mostrecent.enddate

WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate
  AND mostrecent.enddate IS NULL

This includes both tablethree and the new SELECT as left joins, and sorts out the rows where mostrecent.enddate IS NULL (meaning those rows which are actually not most recent). This should lead to the same result, but MySQL may be able to compute this result a little faster. EXPLAIN on both queries should reveal possible differences in computation.

Best Answer

Related Solutions

Mysql – get column from too many tables in thesql

MySQL: LEFT OUTER JOIN within reason

Related Question