I have an application that uses T-SQL views. A number of these views are quite complex, joining data from many tables. To avoid column-name collisions, the views often employ table-scalar functions to return table data under a modified/prefixed schema.
For example, I have two tables (parent ACTION
table and child ACTIONSUB
table), both with many columns, and both with identical columns names —
ACTION
RowGUID UID Ref Name LastUpdate etc...
ACTIONSUB
RowGUID UID fkUID Ref Name LastUpdate etc...
As part of our build process, we regenerate our library of table-access functions, one function per table. These functions can accept a delimited list of UID
values, or will return all rows if passed NULL
. As stated above, these functions' main purpose is to return all column data from the table with a modified/pre-fixed schema, so that we can JOIN
them together into a single View. So, using our table-access functions, our tables above would return —
fn_TAF_ACTION(null)
:
AC_RowGUID AC_UID AC_Ref AC_Name etc...
fn_TAB_ACTIONSUB(null)
:
ACSUB_RowGUID ACSUB_UID ACSUB_fkUID ACSUB_Ref ACSUB_Name etc...
This approach works, until we have very large tables and/or join data from many tables (functions); then our View performance can seriously degrade.
Our views are simple query statements, usually with calculated columns added to the select list. A simple example —
CREATE VIEW vw_ActionSub2Action_01
AS
SELECT
acsub.*, ac.*,
IsNull(ac.AC_Date1, ac.AC_Date2) AS [AC_ComboDate],
ac.AC_Ref +'.'+ acsub.ACSUB_Ref AS [ComboRef], etc...
FROM
fn_TAF_ACTIONSUB(null) acsub
LEFT JOIN
fn_TAF_ACTION(null) ac ON acsub.ACSUB_fkUID = ac.AC_UID
I'm wondering if these Views could be re-written using CTEs, to somehow 'pre-fetch' the modified/pre-fixed schemas using the functions, and then query the tables with SELECT INTO
to populate data into the View.
Is it possible?
Something like —
-- first, setup View schema using TAF functions to return empty data set
SELECT acsub.*, ac.*, '' AS [AC_ComboDate], '' AS [ComboRef], etc...
FROM fn_TAF_ACTIONSUB(0) acsub
LEFT JOIN fn_TAF_ACTION(0) ac ON acsub.ACSUB_fkUID = ac.AC_UID
-- then, populate the View by querying the tables
SELECT acsub.*, ac.*, IsNull(ac.Date1, ac.Date2) AS [AC_ComboDate], ac.Ref +'.'+ acsub.Ref AS [ComboRef], etc...
FROM ACTIONSUB acsub
LEFT JOIN ACTION ac ON acsub.fkUID = ac.UID
Is there a way to combine the above process into a single view definition?
PREFACE
I realize our above use of Views may be inherently non-performant; however, our system uses these Views, and performance is reasonable in most cases, so pushing a 're-design' of our processing logic is not feasible.
UPDATE 1
Our Table-Access-Functions (TAFs) are Table-valued Functions that return a table with a modified schema. They do not use BEGIN…END blocks. Here is an example…
CREATE FUNCTION [dbo].[fn_TAF_ACTION] (@UID varchar(max))
RETURNS TABLE
AS
RETURN (SELECT
AC_RowGUID = RowGUID,
AC_UID = UID,
AC_Ref = Ref,
AC_Name = Name,
...etc...
FROM Action
WHERE UID IN (SELECT IntValue FROM dbo.CsvToInt(@UID)) OR @UID IS NULL)
-- CsvToInt function simply allows us to pass a string of comma-delimited UIDs
Best Answer
No, this isn't possible with a view. You could do something broadly along these lines with a multi-statement function (MSUDF), but:
A multi-statement function would potentially materialize the entire 'view' result set in a table variable, before any predicates in the base query were applied. If the result is large, this overhead will likely be prohibitive.
You could push predicates into the MSUDF using parameters, but then the function body becomes a mess of conditional predicates of the form
column = @value OR @value IS NULL
, which would requireOPTION (RECOMPILE)
in the MSUDF to optimize well.Note that even with the best possible outcome, the users of the view will have to change their queries from view references to MSUDF references (with all parameters specified,
DEFAULT
orNULL
passed for those that are not needed). This may not be workable.All that said, you need to be certain what is causing the performance problem in the first place. The question does not supply an example of a problematic query plan, so some of what follows is educated guesswork:
There are two features of the current scheme that leap out at me:
The use of
ISNULL
and calculated columns prevents the optimizer from pushing base-query predicates down into the underlying in-line functions and base tables. You might get better results for theISNULL
part by rewriting theLEFT JOIN
as aUNION ALL
of an inner join and an anti-semi-join of the two tables. Properly written, this would expose only base column names, allowing the optimizer to push predicates successfully.The in-line functions use an MSUDF to split the supplied CSV into row values. This is a pattern that often causes optimization problems, because the size and distribution of the result is unknown. A query plan that is optimal for a CSV of '1,2,3' is likely not optimal for a
NULL
CSV. AgainOPTION (RECOMPILE)
could help with this. It will simplify the compiled logic for theNULL
case, avoid parameter-sniffing issues, as well as providing cardinality information for the MSUDF result. Distribution statistics will still not be available though.