Sql-server – TSQL – create a function that takes a multi-row query result as an argument

functionssql serversql server 2014

I work with a variety of servers on mostly SQL Server 2014 and earlier.

I have boiler-plate code that can take the results of an arbitrary select statement (returning a list of varchar or the like) and transform them into a single comma separated string. I use this regularly on reports (for e.g. listing the agents associated with a single call).

I'm getting tired of retyping it.

How can I create a function (or stored procedure, or other thing I don't know about) to re-use this code neatly?

Ideal useage would be something like:

select @result = udf_makeCSF( (select [First Name] + [Last Name] from AgentTable agents where ... ) )

I've seen some suggestions about using exec for similar problems and that gives me the stone-cold horrors.

EDIT: To make it clear, I already know how to make a comma-separated list. That's the boiler-plate code which I'm getting tired of copy-pasting everywhere. So please don't get hung up on explaining that, that's a solved problem.

But to extend my example, to make it more concrete for folks:

If I have CallSegments table, like so:

pkey    callid    start time     duration    time to answer    handling agent   caller     ...
1       1         8:00:20        260         12                12               5551234    ...
2       1         8:04:35        380         3                 16               5551234    ...
3       2         9:08:16        512         36                12               5554321    ...
4       3         9:58:42        8           1                 12               5557890    ...

This table represents incoming activity at a call center. Each row is a single segment of a call – A single call may have multiple segments, such as when it is transferred to another agent or another queue, a conference call is started, etc.

This is simplifying the data I'm working with, it better represents the state of affairs halfway through my procedure.

I also have an Agents table, like so

pkey    first name     last name    hire date    ...
12      Alice          Smith        1900-01-01   ...
14      Bob            Jones        2000-12-31   ...

This represents individual call-center agents. The only information I need out of this (for this report) is names of handling agent(s)

I am tasked with producing a report on calls where I assemble the various segments into a single line per call summarizing the lifecycle of the call.

Example desired output would be something like this:

start time     duration    time to answer    handling agents           caller     ...
8:00:20        00:10:40    00:00:15          Alice Smith, Bob Jones    555-1234   ...
9:08:16        00:08:32    00:00:36          Alice Smith               555-4321   ...
9:58:42        00:00:08    00:00:01          Charley Carpenter         555-7890   ...

Collecting the statistics is a group by and a couple of SUM()s. Collecting the handling agents is not.

This is not my exact scenario every time, but it's variations on this depending on report requirements (agent names is common, but also what queues did this call go through, and similar). The similarity is having a column of data that needs to get concatenated with a separator (usually commas) and returned as a single string.

Which feels to me like making this a shared snipped of code to be re-used rather than retyped repeatedly is good. I would write a custom aggregation method in C# and be done with it, but most of the client sites I work at do not allow for this.

Best Answer

Really what you are looking to do is inline concatenation / aggregation. The two easiest methods are:

SQLCLR (works with SQL Server 2005 and newer, but not Azure SQL Database unless it is the new Managed Instance type)
The STRING_AGG aggregate function (works with SQL Server 2017 and newer, and works on Azure SQL Database)

Currently, neither of these options works for you:

Option #1 is prohibited by policy, and (or due to) the fact that the client(s) will soon be migrating to Azure SQL Database which does not support SQLCLR (anymore; it did for about 18 months).
Option #2 is not possible due to being on SQL Server 2014 which does not have that function

This might be a long-shot, but given that you are only prohibited from using SQLCLR by policy, and that they will be migrating to Azure SQL Datbase, you might could do the following:

Short-term (prior to migrating to Azure SQL Database): use a SQLCLR aggregate function
Long-term (upon migrating to Azure SQL Database): swap out the SQLCLR aggregate for STRING_AGG

While this does entail modifying queries upon migration, the modification is a simple replacement. Assuming that all stored procedures, functions, etc are stored in a repository (e.g. Git, SVN, etc), it should be fairly easy to do a mass replacement across multiple files. Then deploy, test, and commit to the repository.

For example (using the SQL# library that I wrote which has Agg_Join in the Free version):

SELECT so.[schema_id],
       COUNT(*) AS [NumObjects],
       SQL#.Agg_Join(DISTINCT RTRIM(so.[type])) AS [Types]
FROM   sys.objects so
GROUP BY so.[schema_id];


SELECT so.[schema_id],
       COUNT(*) AS [NumObjects],
       STRING_AGG(RTRIM(so.[type]), ',')
              WITHIN GROUP (ORDER BY RTRIM(so.[type])) AS [Types]
FROM   sys.objects so
GROUP BY so.[schema_id];

These don't return identical results sets unless you remove the DISTINCT from SQL#.Agg_Join(DISTINCT RTRIM(so.[type])). I kept that in there to show a case where it helps, and to point out that for some reason, DISTINCT is not supported in the STRING_AGG function :(.

If you can do a Regular Expression (RegEx) replace, you can swap out:

SQL#.Agg_Join({stuff})

for:

STRING_AGG({stuff}, ',') [WITHIN GROUP (ORDER BY {stuff})]

with minimal effort. The WITHIN GROUP part is optional (I added it there since the DISTINCT in the SQLCLR version did the ordering). The main point being that the queries are otherwise untouched and should produce the same results, hence not needing a massive retesting effort.

In fact, if you create your own SQLCLR UDA such that it has the second parameter for the delimiter (the SQL# Agg_Join doesn't such that it can work in SQL Server 2005), you could name it "String_Agg". This would reduce the find/replace to be replacing dbo.String_Agg( with String_Agg(.

Related Solutions

Sql-server – Support both stored procedures and functions with same TSQL code base

Sounds like you've already made the decision to use T-SQL for your back-end. For OWASP reasons, you are correct in that calling only stored procedures from your data access layer is the safest approach to avoid SQL injection.

Stored procedures and functions are similar in that you can pass arguments for the parameter list. They also can do similar things. Code re-use is a good thing and will save you in the end when fixing bugs or making enhancements in the future. So find the logic that is similar and re-factor that code to functions. If you want to return tables (single result set), use table based functions. If you want to return single values, use scalar functions. Note, you won't be able to perform permanent environment changes with functions (no CRUD operations are allowed... CREATE/INSERT, READ, UPDATE, DELETE). It's also good practice to wrap your stored procedures with error/exception handling and return those exceptions to the server, to let your logging handle them appropriately. Also note that it's common to return multiple result sets in a single call to a stored procedure (3 SELECT statements for example). In your application code, you'd handle each table individually in the response.

Here are some differences:

enter image description here

PostgreSQL – Function to Execute Query, Manipulate and Return Result

You can use a cursor.

create type my_type as (id int, other_column text);

create or replace function get_data ()
returns setof my_type
language plpgsql as $$
declare
    cur cursor for select id, other_column from my_table limit 20000;
    rec record;
begin
    for rec in cur loop
        update other_table
            set status = 2 
            where id = rec.id;
        return next rec;
    end loop;
end $$;

select * from get_data();

Note however that the function will perform 20000 (maybe less) updates, not just one. It is not obvious if performance will be better or worse than in one query, as it depends on various circumstances. For example, one query should be much faster on an idle server whereas the function might turn out to be better on a heavily loaded server. The best way to check it is to test the function in the actual environment.

You can also use a temporary table. In this case rows are updated in one query.

create or replace function get_data_2 ()
returns setof my_type
language plpgsql as $$
begin
    create temporary table temp_table of my_type on commit drop;
    insert into temp_table
        select id, other_column from my_table limit 20000;
    update other_table 
        set status = 2
        where id in (select id from temp_table);
    return query select * from temp_table;
end $$;

Best Answer

Related Solutions

Sql-server – Support both stored procedures and functions with same TSQL code base

PostgreSQL – Function to Execute Query, Manipulate and Return Result

Related Question