Sql-server – T-SQL View — How to ‘pre-fetch’ schema using scalar function, then populate using table query

ctesql servert-sql

I have an application that uses T-SQL views. A number of these views are quite complex, joining data from many tables. To avoid column-name collisions, the views often employ table-scalar functions to return table data under a modified/prefixed schema.

For example, I have two tables (parent ACTION table and child ACTIONSUB table), both with many columns, and both with identical columns names —

ACTION

  RowGUID  UID         Ref  Name  LastUpdate  etc...

ACTIONSUB

  RowGUID  UID  fkUID  Ref  Name  LastUpdate  etc...

As part of our build process, we regenerate our library of table-access functions, one function per table. These functions can accept a delimited list of UID values, or will return all rows if passed NULL. As stated above, these functions' main purpose is to return all column data from the table with a modified/pre-fixed schema, so that we can JOIN them together into a single View. So, using our table-access functions, our tables above would return —

fn_TAF_ACTION(null):

  AC_RowGUID      AC_UID                    AC_Ref     AC_Name     etc...

fn_TAB_ACTIONSUB(null):

  ACSUB_RowGUID   ACSUB_UID   ACSUB_fkUID   ACSUB_Ref  ACSUB_Name  etc...

This approach works, until we have very large tables and/or join data from many tables (functions); then our View performance can seriously degrade.

Our views are simple query statements, usually with calculated columns added to the select list. A simple example —

CREATE VIEW vw_ActionSub2Action_01 
AS
   SELECT 
      acsub.*, ac.*, 
      IsNull(ac.AC_Date1, ac.AC_Date2) AS [AC_ComboDate], 
      ac.AC_Ref +'.'+ acsub.ACSUB_Ref AS [ComboRef], etc...
   FROM 
      fn_TAF_ACTIONSUB(null) acsub
   LEFT JOIN 
      fn_TAF_ACTION(null) ac ON acsub.ACSUB_fkUID = ac.AC_UID

I'm wondering if these Views could be re-written using CTEs, to somehow 'pre-fetch' the modified/pre-fixed schemas using the functions, and then query the tables with SELECT INTO to populate data into the View.

Is it possible?

Something like —

-- first, setup View schema using TAF functions to return empty data set
SELECT acsub.*, ac.*, '' AS [AC_ComboDate], '' AS [ComboRef], etc...
FROM fn_TAF_ACTIONSUB(0) acsub
LEFT JOIN fn_TAF_ACTION(0) ac ON acsub.ACSUB_fkUID = ac.AC_UID

-- then, populate the View by querying the tables
SELECT acsub.*, ac.*, IsNull(ac.Date1, ac.Date2) AS [AC_ComboDate], ac.Ref +'.'+ acsub.Ref AS [ComboRef], etc...
FROM ACTIONSUB acsub
LEFT JOIN ACTION ac ON acsub.fkUID = ac.UID

Is there a way to combine the above process into a single view definition?

PREFACE

I realize our above use of Views may be inherently non-performant; however, our system uses these Views, and performance is reasonable in most cases, so pushing a 're-design' of our processing logic is not feasible.

UPDATE 1

Our Table-Access-Functions (TAFs) are Table-valued Functions that return a table with a modified schema. They do not use BEGIN…END blocks. Here is an example…

CREATE FUNCTION [dbo].[fn_TAF_ACTION] (@UID varchar(max))
RETURNS TABLE
AS
RETURN (SELECT
  AC_RowGUID = RowGUID,
  AC_UID = UID,
  AC_Ref = Ref,
  AC_Name = Name,
  ...etc...
FROM Action
WHERE UID IN (SELECT IntValue FROM dbo.CsvToInt(@UID)) OR @UID IS NULL)
-- CsvToInt function simply allows us to pass a string of comma-delimited UIDs

Best Answer

I'm wondering if these views could be re-written using CTEs, to somehow 'pre-fetch' the modified/pre-fixed schemas using the functions, and then query the tables with SELECT INTO to populate data into the view.

No, this isn't possible with a view. You could do something broadly along these lines with a multi-statement function (MSUDF), but:

This would still require a static schema definition for the table variable; and
Careful design would be required to avoid terrible performance; and
All potential base query predicates must be supplied as optional parameters

A multi-statement function would potentially materialize the entire 'view' result set in a table variable, before any predicates in the base query were applied. If the result is large, this overhead will likely be prohibitive.

You could push predicates into the MSUDF using parameters, but then the function body becomes a mess of conditional predicates of the form column = @value OR @value IS NULL, which would require OPTION (RECOMPILE) in the MSUDF to optimize well.

Note that even with the best possible outcome, the users of the view will have to change their queries from view references to MSUDF references (with all parameters specified, DEFAULT or NULL passed for those that are not needed). This may not be workable.

All that said, you need to be certain what is causing the performance problem in the first place. The question does not supply an example of a problematic query plan, so some of what follows is educated guesswork:

There are two features of the current scheme that leap out at me:

The use of ISNULL and calculated columns prevents the optimizer from pushing base-query predicates down into the underlying in-line functions and base tables. You might get better results for the ISNULL part by rewriting the LEFT JOIN as a UNION ALL of an inner join and an anti-semi-join of the two tables. Properly written, this would expose only base column names, allowing the optimizer to push predicates successfully.
The in-line functions use an MSUDF to split the supplied CSV into row values. This is a pattern that often causes optimization problems, because the size and distribution of the result is unknown. A query plan that is optimal for a CSV of '1,2,3' is likely not optimal for a NULL CSV. Again OPTION (RECOMPILE) could help with this. It will simplify the compiled logic for the NULL case, avoid parameter-sniffing issues, as well as providing cardinality information for the MSUDF result. Distribution statistics will still not be available though.

Related Solutions

Sql-server – How to do a differential query (delta plus/minus) telling me what rows are in view A that are not in view B and vice versa

You can use a FULL OUTER JOIN for this

WITH T1
     AS (SELECT trantype,
                product_code
         FROM   vwVIEW1
         WHERE  KEY = 'DEMO'),
     T2
     AS (SELECT trantype,
                product_code
         FROM   vwVIEW2
         WHERE  KEY = 'DEMO')
SELECT *
FROM   T1
       FULL OUTER JOIN T2
         ON T1.trantype = T2.trantype
            AND T1.product_code = T2.product_code
WHERE  T1.trantype IS NULL
        OR T2.trantype IS NULL

Sql-server – Automatically detect table name in MSSQL Server database using stored function

I think you would need to use a trigger on the table(s) in question in order to get the table name.

How I have implemented table change tracking is by just that: 1) Create a table to hold the changes and 2) Create a trigger on each table that needed to be tracked.

The triggers ALL look like this (where you replace 'XX_YOURTABLE_XX' with the table in question and [tr_XXXX] with a unique trigger name ):

CREATE TRIGGER [dbo].[tr_XXXX] on [dbo].[XX_YOURTABLE_XX] for INSERT, UPDATE, DELETE
AS

DECLARE 
    @bit INT ,
    @field INT ,
    @maxfield INT ,
    @char INT ,
    @fieldname VARCHAR(128) ,
    @TableName VARCHAR(128) ,
    @PKCols VARCHAR(1000) ,
    @sql VARCHAR(2000), 
    @UserName VARCHAR(128) ,
    @Type char(1) ,
    @PKSELECT VARCHAR(1000)

    SELECT @TableName = 'XX_YOURTABLE_XX'

    -- Get User
    IF object_id('tempdb..#TmpUser') IS NOT NULL
        SELECT  @UserName = TheUser FROM #TmpUser
    ELSE
        SELECT  @UserName = system_user

    -- Action
    IF EXISTS (SELECT * FROM INSERTED)
        IF EXISTS (SELECT * FROM DELETED)
            SELECT @Type = 'U'  --UPDATE
        ELSE
            SELECT @Type = 'I'  --INSERT
    ELSE
        SELECT @Type = 'D'      --DELETE

    -- get lISt of columns
    SELECT * INTo #ins FROM INSERTED
    SELECT * INTo #del FROM DELETED

    -- Get primary key columns for full outer join
    SELECT  @PKCols = coalesce(@PKCols + ' AND', ' on') + ' i.' + c.COLUMN_NAME + ' = d.' + c.COLUMN_NAME
    FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
            INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
    WHERE   pk.TABLE_NAME = @TableName
            AND CONSTRAINT_TYPE = 'PRIMARY KEY'
            AND c.TABLE_NAME = pk.TABLE_NAME
            AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

    -- Get primary key SELECT for INSERT
    SELECT @PKSELECT = coalesce(@PKSELECT+'+','') + '''<' + COLUMN_NAME + '=''+convert(VARCHAR(100),coalesce(i.' + COLUMN_NAME +',d.' + COLUMN_NAME + '))+''>''' 
    FROM    INFORMATION_SCHEMA.TABLE_CONSTRAINTS pk ,
            INFORMATION_SCHEMA.KEY_COLUMN_USAGE c
    WHERE   pk.TABLE_NAME = @TableName
            AND CONSTRAINT_TYPE = 'PRIMARY KEY'
            AND c.TABLE_NAME = pk.TABLE_NAME
            AND c.CONSTRAINT_NAME = pk.CONSTRAINT_NAME

    IF @PKCols IS NULL
    BEGIN
        RAISERROR('no PK on table %s', 16, -1, @TableName)
        RETURN
    END

    SELECT @field = 0, @maxfield = max(ORDINAL_POSITION) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName
    WHILE @field < @maxfield
    BEGIN
        SELECT @field = min(ORDINAL_POSITION) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName AND ORDINAL_POSITION > @field
        SELECT @bit = (@field - 1 )% 8 + 1
        SELECT @bit = power(2,@bit - 1)
        SELECT @char = ((@field - 1) / 8) + 1
        IF substring(COLUMNS_UPDATED(),@char, 1) & @bit > 0 or @Type in ('I','D')
        BEGIN
            SELECT @fieldname = COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @TableName AND ORDINAL_POSITION = @field
            SELECT @sql =       'INSERT tbl_TRACKING (Type, TableName, PK, FieldName, OldValue, NewValue, UserName)'
            SELECT @sql = @sql +    ' SELECT ''' + @Type + ''''
            SELECT @sql = @sql +    ',''' + @TableName + ''''
            SELECT @sql = @sql +    ',' + REPLACE(REPLACE(REPLACE(@PKSELECT,'<',''),'>',''),'PriKeyVal=','')
            SELECT @sql = @sql +    ',''' + @fieldname + ''''
            SELECT @sql = @sql +    ',convert(VARCHAR(1000),d.' + @fieldname + ')'
            SELECT @sql = @sql +    ',convert(VARCHAR(1000),i.' + @fieldname + ')'
            SELECT @sql = @sql +    ',''' + @UserName + ''''
            SELECT @sql = @sql +    ' FROM #ins i full outer join #del d'
            SELECT @sql = @sql +    @PKCols
            SELECT @sql = @sql +    ' WHERE i.' + @fieldname + ' <> d.' + @fieldname 
            SELECT @sql = @sql +    ' or (i.' + @fieldname + ' IS NULL AND  d.' + @fieldname + ' IS not NULL)' 
            SELECT @sql = @sql +    ' or (i.' + @fieldname + ' IS not NULL AND  d.' + @fieldname + ' IS NULL)' 
            exec (@sql)
        END
    END

GO

The tracking table looks like this:

CREATE TABLE [dbo].[tbl_TRACKING](
    [Type] [char](1) NULL,
    [TableName] [varchar](128) NULL,
    [PK] [varchar](1000) NULL,
    [FieldName] [varchar](128) NULL,
    [OldValue] [varchar](1000) NULL,
    [NewValue] [varchar](1000) NULL,
    [ActionDate] [datetimeoffset](3) NULL,
    [UserName] [varchar](128) NULL,
    [AppName] [varchar](128) NULL,
    [ComputerName] [varchar](128) NULL
)

Hope that helps

Best Answer

Related Solutions

Sql-server – How to do a differential query (delta plus/minus) telling me what rows are in view A that are not in view B and vice versa

Sql-server – Automatically detect table name in MSSQL Server database using stored function

Related Question