Sql-server – T-SQL View — How to ‘pre-fetch’ schema using scalar function, then populate using table query

ctesql servert-sql

I have an application that uses T-SQL views. A number of these views are quite complex, joining data from many tables. To avoid column-name collisions, the views often employ table-scalar functions to return table data under a modified/prefixed schema.

For example, I have two tables (parent ACTION table and child ACTIONSUB table), both with many columns, and both with identical columns names —

ACTION

  RowGUID  UID         Ref  Name  LastUpdate  etc...

ACTIONSUB

  RowGUID  UID  fkUID  Ref  Name  LastUpdate  etc...

As part of our build process, we regenerate our library of table-access functions, one function per table. These functions can accept a delimited list of UID values, or will return all rows if passed NULL. As stated above, these functions' main purpose is to return all column data from the table with a modified/pre-fixed schema, so that we can JOIN them together into a single View. So, using our table-access functions, our tables above would return —

fn_TAF_ACTION(null):

  AC_RowGUID      AC_UID                    AC_Ref     AC_Name     etc...

fn_TAB_ACTIONSUB(null):

  ACSUB_RowGUID   ACSUB_UID   ACSUB_fkUID   ACSUB_Ref  ACSUB_Name  etc...

This approach works, until we have very large tables and/or join data from many tables (functions); then our View performance can seriously degrade.

Our views are simple query statements, usually with calculated columns added to the select list. A simple example —

CREATE VIEW vw_ActionSub2Action_01 
AS
   SELECT 
      acsub.*, ac.*, 
      IsNull(ac.AC_Date1, ac.AC_Date2) AS [AC_ComboDate], 
      ac.AC_Ref +'.'+ acsub.ACSUB_Ref AS [ComboRef], etc...
   FROM 
      fn_TAF_ACTIONSUB(null) acsub
   LEFT JOIN 
      fn_TAF_ACTION(null) ac ON acsub.ACSUB_fkUID = ac.AC_UID

I'm wondering if these Views could be re-written using CTEs, to somehow 'pre-fetch' the modified/pre-fixed schemas using the functions, and then query the tables with SELECT INTO to populate data into the View.

Is it possible?

Something like —

-- first, setup View schema using TAF functions to return empty data set
SELECT acsub.*, ac.*, '' AS [AC_ComboDate], '' AS [ComboRef], etc...
FROM fn_TAF_ACTIONSUB(0) acsub
LEFT JOIN fn_TAF_ACTION(0) ac ON acsub.ACSUB_fkUID = ac.AC_UID

-- then, populate the View by querying the tables
SELECT acsub.*, ac.*, IsNull(ac.Date1, ac.Date2) AS [AC_ComboDate], ac.Ref +'.'+ acsub.Ref AS [ComboRef], etc...
FROM ACTIONSUB acsub
LEFT JOIN ACTION ac ON acsub.fkUID = ac.UID

Is there a way to combine the above process into a single view definition?

PREFACE

I realize our above use of Views may be inherently non-performant; however, our system uses these Views, and performance is reasonable in most cases, so pushing a 're-design' of our processing logic is not feasible.

UPDATE 1

Our Table-Access-Functions (TAFs) are Table-valued Functions that return a table with a modified schema. They do not use BEGIN…END blocks. Here is an example…

CREATE FUNCTION [dbo].[fn_TAF_ACTION] (@UID varchar(max))
RETURNS TABLE
AS
RETURN (SELECT
  AC_RowGUID = RowGUID,
  AC_UID = UID,
  AC_Ref = Ref,
  AC_Name = Name,
  ...etc...
FROM Action
WHERE UID IN (SELECT IntValue FROM dbo.CsvToInt(@UID)) OR @UID IS NULL)
-- CsvToInt function simply allows us to pass a string of comma-delimited UIDs

Best Answer

I'm wondering if these views could be re-written using CTEs, to somehow 'pre-fetch' the modified/pre-fixed schemas using the functions, and then query the tables with SELECT INTO to populate data into the view.

No, this isn't possible with a view. You could do something broadly along these lines with a multi-statement function (MSUDF), but:

  1. This would still require a static schema definition for the table variable; and
  2. Careful design would be required to avoid terrible performance; and
  3. All potential base query predicates must be supplied as optional parameters

A multi-statement function would potentially materialize the entire 'view' result set in a table variable, before any predicates in the base query were applied. If the result is large, this overhead will likely be prohibitive.

You could push predicates into the MSUDF using parameters, but then the function body becomes a mess of conditional predicates of the form column = @value OR @value IS NULL, which would require OPTION (RECOMPILE) in the MSUDF to optimize well.

Note that even with the best possible outcome, the users of the view will have to change their queries from view references to MSUDF references (with all parameters specified, DEFAULT or NULL passed for those that are not needed). This may not be workable.

All that said, you need to be certain what is causing the performance problem in the first place. The question does not supply an example of a problematic query plan, so some of what follows is educated guesswork:

There are two features of the current scheme that leap out at me:

  1. The use of ISNULL and calculated columns prevents the optimizer from pushing base-query predicates down into the underlying in-line functions and base tables. You might get better results for the ISNULL part by rewriting the LEFT JOIN as a UNION ALL of an inner join and an anti-semi-join of the two tables. Properly written, this would expose only base column names, allowing the optimizer to push predicates successfully.

  2. The in-line functions use an MSUDF to split the supplied CSV into row values. This is a pattern that often causes optimization problems, because the size and distribution of the result is unknown. A query plan that is optimal for a CSV of '1,2,3' is likely not optimal for a NULL CSV. Again OPTION (RECOMPILE) could help with this. It will simplify the compiled logic for the NULL case, avoid parameter-sniffing issues, as well as providing cardinality information for the MSUDF result. Distribution statistics will still not be available though.