T-SQL – How to Query Orders Containing a Product Without Sub-Query

t-sql

Order
-----
OrderId

OrderDetail
-----------
OrderDetailId
OrderId
ProductId

How do I write a query that returns all orders that contain a specific product without using a sub-query?

I don't want to write a sub-query because I want to encapsulate the query in a view and use a WHERE clause on the view to supply the product id instead of having a stored procedure and supplying a parameter.

I figure the only way to do this is to join orders to details and GROUP BY on order but even then I don't know how to have the aggregate column resolve to whether the product is amongst the order details.

Best Answer

I have no idea about efficiency^* but a view like this:

CREATE VIEW 
    Orders_with_a_product_View AS
SELECT 
    p.ProductID, o.*
FROM
    dbo.Product AS p
  JOIN
    dbo.[Order] AS o
      ON EXISTS
         ( SELECT 1
           FROM dbo.OrderDetail AS od
           WHERE od.OrderID = o.OrderID
             AND od.ProductId = p.ProductId
         ) ;

would allow you to use it with:

SELECT * 
FROM Orders_with_a_product_View 
WHERE ProductID = X ;

(*Regarding efficiency: Hopefully the optimizer will "push-down" the condition and not do a cross join of all products and orders.)

It might be better to follow a simpler road, like this:

CREATE VIEW 
    Order_Product_IDs AS
SELECT DISTINCT 
    ProductID, OrderID
FROM
    dbo.OrderDetail ;

and the utilize it with:

SELECT o.* 
FROM dbo.Order_Product_IDs AS op 
  JOIN dbo.[Order] AS o
    ON o.OrderID = op.OrderID
WHERE op.ProductID = X ;

Related Solutions

Efficient Query for Advanced EAV Model in SQL Server

First of all, what you are about to design is probably a VERY bad idea. A much better solution would be to have a dynamic schema where you add new tables and have the application understand how to query those table (you could place them in a schema). This largely avoids all the locking and query plan issues you are bound to run into with this model. There is nothing wrong with applications running CREATE TABLE now and again.

Second, I am not sure I understand why you have normalised Parameter into its own table? Why not put that directly into the ManufacturerParameter table.

Third, if you insist on proceeding with your current model, there are ways to achieve what you want (at least if I am interpreting your requirement correctly). What you can do is to write your query in such a way that it is a sums up the search argument when there is a match and then use HAVING to filter out the values that match. I am assuming that only one of the fields Text, Boolean, Datum etc are populated per ProductParameter record (you probably want to enforce this with a constraint)

For example, to search for all products that have a bolean = true for one parameter AND text = 'abc' for some other parameter you can do:

SELECT P.Name
FROM Product P
JOIN ProductParameter PP
WHERE P.ID = Foo
  AND PP.Boolean = 1 OR PP.Text = 'abc'  ... /* For each filter */
GROUP BY P.Name /* And any other things you want out of product */
HAVING COUNT(*) >= [Number of where clauses]

If you need to list all the parameters of this product, you can use the above query template as a nested query and join back to ProductParameter.

The above query CAN be optimised by maintaining a computed column in ProductParameter that has a string representation of the different data types in that table. That way, the above OR statements can be rewritten as an IN list (which you will want to pass as a table valued parameter).

I would like to repeat that what you are doing is probably very wrong. If you do it, you will most likely need to hand tune most of your query plans - the optimiser will not help you anymore. And that is assuming you don't have too many query variants, which will run your plan cache full.

How to Avoid Using Merge Query When Upserting Multiple Data with XML Parameter

Whether the source is XML or a TVP does not make a huge difference. The overall operation is essentially:

UPDATE existing rows
INSERT missing rows

You do it in that order because if you INSERT first, then all rows exist to get the UPDATE and you will do repeated work for any rows that were just inserted.

Beyond that there are different ways to accomplish this and various ways to tweak some additional efficiency out of it.

Let's start with the bare minimum. Since extracting the XML is likely to be one of the more expensive parts of this operation (if not the most expensive), we don't want to have to do that twice (as we have two operations to perform). So, we create a temp table and extract the data out of the XML into it:

CREATE TABLE #TempImport
(
  Field1 DataType1,
  Field2 DataType2,
  ...
);

INSERT INTO #TempImport (Field1, Field2, ...)
  SELECT tab.col.value('XQueryForField1', 'DataType') AS [Field1],
         tab.col.value('XQueryForField2', 'DataType') AS [Field2],
         ...
  FROM   @XmlInputParam.nodes('XQuery') tab(col);

From there we do the UPDATE and then the INSERT:

UPDATE tab
SET    tab.Field1 = tmp.Field1,
       tab.Field2 = tmp.Field2,
       ...
FROM   [SchemaName].[TableName] tab
INNER JOIN #TempImport tmp
        ON tmp.IDField = tab.IDField
        ... -- more fields if PK or alternate key is composite

INSERT INTO [SchemaName].[TableName]
  (Field1, Field2, ...)
  SELECT tmp.Field1, tmp.Field2, ...
  FROM   #TempImport tmp
  WHERE  NOT EXISTS (
                       SELECT  *
                       FROM    [SchemaName].[TableName] tab
                       WHERE   tab.IDField = tmp.IDField
                       ... -- more fields if PK or alternate key is composite
                     );

Now that we have the basic operation down, we can do a few things to optimize:

capture @@ROWCOUNT of insert into temp table and compare to @@ROWCOUNT of the UPDATE. If they are the same then we can skip the INSERT
capture the ID values updated via the OUTPUT clause and DELETE those from the temp table. Then the INSERT doesn't need the WHERE NOT EXISTS(...)
IF there are any rows in the incoming data that should not be synced (i.e. neither inserted nor updated), then those records should be removed prior to doing the UPDATE

CREATE TABLE #TempImport
(
  Field1 DataType1,
  Field2 DataType2,
  ...
);

DECLARE @ImportRows INT;
DECLARE @UpdatedIDs TABLE ([IDField] INT NOT NULL);

BEGIN TRY

  INSERT INTO #TempImport (Field1, Field2, ...)
    SELECT tab.col.value('XQueryForField1', 'DataType') AS [Field1],
           tab.col.value('XQueryForField2', 'DataType') AS [Field2],
           ...
    FROM   @XmlInputParam.nodes('XQuery') tab(col);

  SET @ImportRows = @@ROWCOUNT;

  IF (@ImportRows = 0)
  BEGIN
    RAISERROR('Seriously?', 16, 1); -- no rows to import
  END;

  -- optional: test to see if it helps or hurts
  -- ALTER TABLE #TempImport
  --   ADD CONSTRAINT [PK_#TempImport]
  --   PRIMARY KEY CLUSTERED (PKField ASC)
  --   WITH FILLFACTOR = 100;


  -- optional: remove any records that should not be synced
  DELETE tmp
  FROM   #TempImport tmp
  INNER JOIN [SchemaName].[TableName] tab
          ON tab.IDField = tmp.IDField
          ... -- more fields if PK or alternate key is composite
  WHERE  tmp.ModifiedDate < tab.ModifiedDate;

  BEGIN TRAN;

  UPDATE tab
  SET    tab.Field1 = tmp.Field1,
         tab.Field2 = tmp.Field2,
         ...
  OUTPUT INSERTED.IDField
  INTO   @UpdatedIDs ([IDField]) -- capture IDs that are updated
  FROM   [SchemaName].[TableName] tab
  INNER JOIN #TempImport tmp
          ON tmp.IDField = tab.IDField
          ... -- more fields if PK or alternate key is composite

  IF (@@ROWCOUNT < @ImportRows) -- if all rows were updates then skip, else insert remaining
  BEGIN
    -- get rid of rows that were updates, leaving only the ones to insert
    DELETE tmp
    FROM   #TempImport tmp
    INNER JOIN @UpdatedIDs del
            ON del.[IDField] = tmp.[IDField];

    -- OR, rather than the DELETE, maybe add a column to #TempImport for:
    -- [IsUpdate] BIT NOT NULL DEFAULT (0)
    -- Then UPDATE #TempImport SET [IsUpdate] = 1 JOIN @UpdatedIDs ON [IDField]
    -- Then, in below INSERT, add:  WHERE [IsUpdate] = 0

    INSERT INTO [SchemaName].[TableName]
      (Field1, Field2, ...)
      SELECT tmp.Field1, tmp.Field2, ...
      FROM   #TempImport tmp
  END;

  COMMIT TRAN;

END TRY
BEGIN CATCH
  IF (@@TRANCOUNT > 0)
  BEGIN
    ROLLBACK;
  END;

  -- THROW; -- if using SQL 2012 or newer, use this and remove the following 3 lines
  DECLARE @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();
  RAISERROR(@ErrorMessage, 16, 1);
  RETURN;
END CATCH;

I have used this model several times on Imports / ETLs that either have well over 1000 rows or maybe 500 in a batch out of a total set of 20k - over a million rows. However, I have not tested the performance difference between the DELETE of the updated rows out of the temp table vs just updating the [IsUpdate] field.

Please note regarding the decision to use XML over TVP due to there being, at most, 1000 rows to import at a time (mentioned in the question):

If this is being called a few times here and there, then quite possibly the minor performance gain in TVP might not be worth the additional maintenance cost (needing to drop the proc before changing the User-Defined Table Type, app code changes, etc). But if you are importing 4 million rows, sending 1000 at a time, that is 4000 executions (and 4 million rows of XML to parse no matter how it is broken up), and even a minor performance difference when executed only a few times will add up to a noticeable difference.

That being said, the method as I have described does not change outside of replacing the SELECT FROM @XmlInputParam to be SELECT FROM @TVP. Since TVPs are read-only, you wouldn't be able to delete from them. I guess you could simply add a WHERE NOT EXISTS(SELECT * FROM @UpdateIDs ids WHERE ids.IDField = tmp.IDField) to that final SELECT (tied to the INSERT) instead of the simple WHERE IsUpdate = 0. If you were to use the @UpdateIDs table variable in this manner, then you could even get away with not dumping the incoming rows into the temp table.

Best Answer

Related Solutions

Efficient Query for Advanced EAV Model in SQL Server

How to Avoid Using Merge Query When Upserting Multiple Data with XML Parameter

Related Question