Philosophy on stored procedures that perform updates: is a ‘granular’ update approach sufficient

stored-proceduresupdate

I am embarking on on a large project and have an inquiry about stored procedures in general.

Let's say I have a User table.

UserKey IDENTITY NOT NULL PRIMARY KEY
FirstName nvarchar(50)NOT NULL
LastName nvarchar(50) NOT NULL
Email nvarchar(50) NOT NULL
Birthdate date NULL
EffectiveFrom datetimeoffset NOT NULL
EffectiveThru datetimeoffset NULL

In reality, there are some more fields here, but for the sake of brevity I have skimmed it down to the basics.

Now, I am faced with creating stored procedures to allow basic CRUD functionality for this table.

The Insert sproc is simple enough.

usp_INSERT_User:

CREATE PROCEDURE [dbo].[usp_INSERT_User]
(
    --@UserKey BIGINT,
    @FirstName NVARCHAR(50),
    @LastName NVARCHAR(50),
    @Email NVARCHAR(50),
    @BirthDate DATE = NULL,
    @EffectiveFromDtTm DATETIMEOFFSET(7),
    @EffectiveThruDtTm DATETIMEOFFSET(7) = NULL,
)

My main inquiry is how I should write an update sproc.

Since there can be pretty granular updates, should I create an update stored procedure for each of those granular updates?

For example, let's say I only want to update the user's email after processing a verified email update for the user, would I write a sproc with one parameter?

usp_UPDATE_UserEmail @NewEmail

If the user updated their profile, I could write a sproc like:

usp_UPDATE_User @UserKey, @FirstName, @LastName, @Email, @Birthdate

So, to cover my most common update operations, I might end up with a few sprocs to cover use cases of updates to the User table. Obviously, the larger the table and/or the higher number of business requirements would have a direct correlation to the number of update sprocs written for a particular table. This gets me thinking about maintainability of a ton of sprocs.

Yesterday, I went down the rabbit hole of

"oh, why don't I just write ONE end-all-be-all update sproc that can take whatever parameter I want to update via an XML param"

Luckily, without using dynamic sql, I was able to create that monstrosity. And it works.

I then decided to test speeds between my fancy xml sproc and just a regular update sproc (with multiple params) with very similar updates.

So, I created a mock user table with 12 million records and tested both approaches. Unsurprisingly, the XML approach on average, took 7x longer than the vanilla update sproc.

There is also additional overhead on the other side of the wire serializing parameters into XML, which could be of concern with a large update batch.

Now, I say all of that to wrap up and ask which type of approach in your experience is most often (if at all) seen? Is is most common practice to have granular sprocs like usp_UPDATE_Field1 / usp_UPDATE_Field2 / usp_UPDATE_Field3&4 OR usp_UPDATE_whateveryouwant

I like the simplicity of having a golden sproc that can take however many number of parameters you want to pass it and update them accordingly, but there are also performance drawbacks.

To give more context: running SQL Server 2019. I am also pretty new at this stuff.

Thanks for any help!

Best Answer

Stored procs are the programming API of the database, the methods you invoke to mutate the system state. As such, I prefer to have SPs names reflect the business outcome desired rather than the implementation - change_of_address rather than update_customer. Each SP has the parameters to perform that business change and no others. Each does one thing and does it well.

If the business supports two processes that both update a common column then so be it. That column is an input to each of those SPs. Say there's a web based "change profile" and a mobile-only "register new phone". They both write to customer.phone_number. I would have separate SPs for each business purpose.

There is no measurable overhead to having many small, tightly focused stored procedures. Changes to data types are automatic on the next recompile. Changes to column names are simple search-and-replace. SSMS's built-in dependency analysis can help, too. If you use local variables that mirror columns use the column's name e.g. customer_name is read into @old_customer_name.

OPTION #1 : INSERT INTO ... ON DUPLICATE KEY UPDATE

INSERT INTO newtable
SELECT * FROM oldtable
ORDER BY tmstamp_value
ON DUPLICATE KEY UPDATE email=VALUES(email), address=VALUES(address) ... ;

OPTION #2 : REPLACE INTO

REPLACE INTO newtable
SELECT * FROM oldtable
ORDER BY tmstamp_value;

It mechanically performs a DELETE and an INSERT.

CAVEAT

Notice I applied the ORDER BY against the old table. This should help get the latest data for a give LastName, FirstName.

You should also add a UNIQUE INDEX on the new table as follows:

ALTER TABLE newtable ADD UNIQUE INDEX name_ndx (lastname,firstname);

UPDATE 2013-04-08 17:13 EDT

If you are doing updates, there are two ways to do this:

1. UPDATE JOIN

UPDATE
(SELECT lastname,firstname,phone,email
FROM oldtable ORDER BY lastname,firstname) A
LEFT JOIN newtable B USING (lastname,firstname)
SET B.phone=A.phone,B.email=A.email;

2. UPDATE JOIN with GROUP BY

UPDATE
(
    SELECT BB.id,BB.phone,BB.email FROM 
    (SELECT lastname,firstname,MAX(id) id FROM oldtable GROUP BY lastname,firstname) AA
    INNER JOIN oldtable BB USING (lastname,firstname)
) A INNER JOIN newtable B USING (id)
SET B.phone=A.phone,B.email=A.email;

Give it a Try !!!

Both options will take full advantage of the UNIQUE index on name.

Best Answer

Related Solutions

Sql-server – Need a stored procedure to check and update row when it does not already exists.

Mysql – Update one table from another table while sorting that table based on one column