Sql-server – create a computed column that requires input to select

computed-columnsql server

I have a scenario in a very old and very large application, where I have a table representing a type of resource:

CREATE TABLE resource (resource_id INT, name NVARCHAR(4000))

This table is selected from in hundreds of different places, including stored procedures and dynamic SQL in the application code.

A team recently updated this resource's name to be localized, and their approach is pretty straight forward. There is a new table containing the localized names, and a 'default' language ID on the resource table, for when the name isn't localized for the requested language:

-- Foreign keys omitted
ALTER TABLE resource ADD default_language_id INT
CREATE TABLE resource_local (resource_id INT, language_id INT, name NVARCHAR(4000))

Most procs have an @user_language_id parameter, so the logic for choosing the name to return is simple: take resource_local.name matching language_id = @user_language_id if it exists, otherwise resource_local.name matching language_id = resource.default_language_id if it exists, otherwise take resource.name.

Unfortunately, this turns the logic to select the correct name into something like this:

SELECT ISNULL(ISNULL(exact.name, default.name), res.name)
FROM resource res
LEFT JOIN resource_local exact ON exact.resource_id = res.resource_id 
    AND exact.language_id = @user_language_id
LEFT JOIN resource_local default ON default.resource_id = res.resource_id
    AND default.language_id = res.default_language_id
WHERE res.resource_id = @resource_id

All of the hundreds of places that try to select resource.name are having to be updated with this logic, which has turned this project into a massive effort across the entire organization, as each team needs to update their SQL to use this logic. This also causes maintainability issues, as any new developers dealing with this table need to know that they can't just use the name column.

It's too late now, but I'm curious: is there any better way to approach this, so that selecting the name column from resource will just 'do the right thing' based on the @user_language_id variable (if it exists)?

Best Answer

I'm not sure if it's possible to do this so that none of the references to the resource table need to change. It seems like the fact that a language_id is needed is a fundamental change that all calling code will need to be aware of.

However, it is possible to design this in a way that the resource can be queried in either of the following simple ways. One of these options might have been an easier change to make and maintain across so many different places.

Table-valued function

Using an Inline Table-Valued Function, we can provide the following syntax.

SELECT resource_id, language_id, name
FROM dbo.resourceTVF(@resource_id, @language_id) r

Here is an example of how to create the function. It's essentially the same query from your question, but with the alias default changed to def (default is a SQL Server keyword).

-- Create the Table-Valued Function
CREATE FUNCTION dbo.resourceTVF (@resource_id INT, @user_language_id INT)
RETURNS TABLE 
AS
RETURN
SELECT @resource_id AS resource_id,
  @user_language_id AS language_id,
  ISNULL(ISNULL(exact.name, def.name), res.name) AS name
FROM dbo.resource res
LEFT JOIN dbo.resource_local exact ON exact.resource_id = res.resource_id 
    AND exact.language_id = @user_language_id
LEFT JOIN dbo.resource_local def ON def.resource_id = res.resource_id
    AND def.language_id = res.default_language_id
WHERE res.resource_id = @resource_id
GO

View

You could rename the resource table (e.g., to resource_base) and then create a resource view in order to provide the following API:

SELECT resource_id, language_id, name
FROM dbo.resource
WHERE resource_id = @resource_id
  AND language_id = @language_id

The primary downside is that the view definition needs to CROSS JOIN all resources and languages before using applying the LEFT JOIN to the local and default resources. Even so, this is going to be a fairly efficient plan with 4 singleton seeks assuming that you have the proper indexes.

CREATE VIEW dbo.resource WITH SCHEMABINDING AS
SELECT res.resource_id,
  lang.language_id,
  ISNULL(ISNULL(exact.name, def.name), res.name) AS name
FROM dbo.resource_base res
CROSS JOIN dbo.languages lang
LEFT JOIN dbo.resource_local exact ON exact.resource_id = res.resource_id 
    AND exact.language_id = lang.language_id
LEFT JOIN dbo.resource_local def ON def.resource_id = res.resource_id
    AND def.language_id = res.default_language_id
GO

Full script

Here is a full script where I implemented both of these proposals, loaded a small amount of fake data, and ran a few test cases. At least for these test cases, both approaches yield the desired results and use a loop-seek based plan.

I think that the inline table-valued function is probably the approach the I'd try first. Note that you can use CROSS APPLY to "join" to the table-valued function if you need more than one resource at a time.

Related Solutions

Sql-server – SELECT/INSERT Deadlock

On the face of it, this looks like a classic lookup deadlock. The essential ingredients for this deadlock pattern are:

a SELECT query that uses a non-covering nonclustered index with a Key Lookup
an INSERT query that modifies the clustered index and then the nonclustered index

The SELECT accesses the nonclustered index first, then the clustered index. The INSERT access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course.

In this case, the SELECT query is:

SELECT query

...and the INSERT query is:

INSERT query

Notice the green highlighted non-clustered indexes maintenance.

We would need to see the serial version of the SELECT plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to Handling Deadlocks, this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason.

Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally escalate the isolation level to REPEATABLE READ for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting.

The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in cost threshold for parallelism fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired MAXDOP 1 behaviour and the other application is able to use parallelism.

The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue.

Sql-server – Doing a left join and having every match include an extra null row

I see 3 ways to do this but all involve a UNION ALL:

your version`:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN Table2 T2
        ON T1.Key1 = T2.Key1

UNION ALL

SELECT
    T1.C1, ..., T1.CN,
    NULL, ... NULL
FROM
    Table1 T1 ;

slightly changing the second part:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN Table2 T2
        ON T1.Key1 = T2.Key1

UNION ALL

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 LEFT JOIN Table2 T2
        ON 0 = 1 ;                   -- FALSE

first a UNION, then join:

SELECT
    T1.C1, ....., T1.CN,
    T2.C1, ..., T2.CM
FROM
    Table1 T1 JOIN  
      ( SELECT * FROM T2
        UNION ALL
        SELECT NULL, ..., NULL
       ) AS T2
        ON T1.Key1 = T2.Key1 
        OR T2.Key1 IS NULL ;

I don't think there will be much difference in execution plans and efficiency but the first one seems more simple.

Best Answer

Related Solutions

Sql-server – SELECT/INSERT Deadlock

Sql-server – Doing a left join and having every match include an extra null row

Related Question