Sql-server – Correlated Subquery SQL Server 2014

sql serversubquery

I am (unsuccessfully) using a correlated subquery on an Invoices table:

Invoices(InvoiceID,  VendorID, InvoiceTotal, PaymentTotal, CreditTotal,.... ),

to find the sum of largest unpaid invoices by all vendors, where the unpaid condition is given by InvoiceTotal-PaymentTotal-CreditTotal <0:

 Select Sum(LargestUnpaid) from 
 (Select   Max(InvoiceTotal) AS LargestUnpaid from Invoices 
  where InvoiceTotal-(PaymentTotal+CreditTotal)<0 group by vendorID ) ;

The inner query runs , which is good and bad, since the query is supposed
to be correlated , but the query as a whole does not run, and I get the error message:

Msg 102, Level 15, State 1, Line 4 Incorrect syntax near ')'.

What am I doing wrong?

Best Answer

Add an alias for the derived table before the ; for example VendLargestUnpaidInv:

Select Sum(LargestUnpaid) from
 (Select   Max(InvoiceTotal) AS LargestUnpaid from Invoices 
  where InvoiceTotal-(PaymentTotal+CreditTotal)<0 group by vendorID ) VendLargestUnpaidInv;

Related Solutions

Sql-server – How to interpret correlated subqueries

This specific case (the exists predicate) is a bit of an outlier as far as correlated subqueries go. ish...

Some correlated subqueries are certainly evaluated once for every row in the outer query, but I don't think you can apply that across the board, and I specifically think that's a statement that cannot be applied with authority to the exists predicate.

The SQL example below this paragraph is highly contrived, but nevertheless illustrates an "evaluated once for each row in the outer query" correlated subquery. It's "highly contrived" because a straight relational join would be a much better solution, and in this simple case the join is super easy to write.

select
    team_name
    , (
       select league_name
       from leagues
       where leagues.league_id = teams.league_id
    ) as league_name
from teams

In this instance, each row in the result set is going to contain the name of the league that the team belongs to, because of the correlated subquery. And the correlated subquery is probably going to be run once for each row in the outer query. Maybe.

The thing is, the database's query optimizer might rewrite this as a more efficient join, depending on the database. So even this simple example might not be executed "once per row in the outer query."

Either way, the subquery is dependent on the values returned from the table in the outer query, and that makes it a correlated subquery--because of the correlation with the outer table.

Part of the point of SQL in general is that the specific implementation is not important to you.

You should think in terms of sets of data (think set arithmetic), not in terms of iterations and comparisons per iteration.

The exists predicate doesn't make you any promises about the particular implementation. In general, exists is supposed to take advantage of table statistics and indexes in such a way that it produces a more efficient query plan than many other approaches will (and more efficient than you're likely to come up with on your own).

The actual query that ends up being run is going to be some kind of join, or multiple joins. But you don't need to get worked up about that. That's why your exists clause contains select * instead of identifying specific fields.

The query plan produced by the database engine will figure out which fields it wants to use, which key or keys and which index or indexes matter, etc.

So your exists example is still a correlated subquery, despite the underlying implementation, because the inner query refers to and is logically dependent upon the table in the outer query.

SELECT DISTINCT store_type FROM stores
WHERE EXISTS
    (
    SELECT * FROM cities_stores
    WHERE cities_stores.store_type = stores.store_type
    );

Sql-server – Correlated Subquery in SQL Server 2014

The syntax for IN(...) is:

test_expression [ NOT ] IN 
    ( subquery | expression [ ,...n ]    )

With:

test_expression

Is any valid expression.

subquery

Is a subquery that has a result set of one column. This column must have the same data type as test_expression.

expression[ ,... n ]

Is a list of expressions to test for a match. All expressions must be of the same type as test_expression.

This means that only 1 column is allowed on both side. Even a list of expressions is considered as 1 dummy table with 1 column for expression values similar to:

SELECT exp FROM (values(exp1), (exp2), ...) as X(exp)

With 2 columns you can use a subquery with EXISTS:

Select VendorName, VendorCity, VendorState 
FROM Vendors AS V1 
WHERE NOT EXISTS (
    SELECT 1 
    FROM Vendors AS V2 
    WHERE V2.VendorID <> V1.VendorID
        AND VendorCity = VendorCity
        AND VendorState = VendorState
)

Overall this subquery always return 0 row or 1 or more rows with 1 because it only needs to know if there is 1 or more rows with the same City and State. The value is not important because it only looks at the number of row (=> if 1 or more rows EXISTS), hence the SELECT 1. The test has already been done in the inner WHERE clause.

I guess that VendorId is the PK. An index on VendorCity and VendorState would help.

Since you are learning and if you don't already know it, you can also look at the usage of APPLY (CROSS APPLY and OUTER APPLY). I let you try it and play with it.

Best Answer

Related Solutions

Sql-server – How to interpret correlated subqueries

Part of the point of SQL in general is that the specific implementation is not important to you.

Sql-server – Correlated Subquery in SQL Server 2014

test_expression

subquery

expression[ ,... n ]

Related Question