Unit testing of stored procedures

best practicesunit test

I've been considering this for quite a long time now.

The basic question is: how to unit test stored procedures?

I see that I can set up unit tests relatively easily for functions in the classic sense (I mean they get zero or more arguments and return a value). But if I consider a real-life example of a seemingly simple procedure inserting a row somewhere, with a few triggers doing this and that before or after the insert, even defining the boundaries of a 'unit' is quite difficult. Should I test only the INSERT itself? That's fairly straightforward, I think–with relatively low value. Should I test the result of the whole chain of events? Apart from the question whether this is a unit test or not, designing a suitable test can be quite a strenuous job with lots of additional question marks arising on the way.

And then comes the problem of constantly changing data. In the case of an UPDATE affecting more than just a few rows, every potentially affected row must be included somehow in the test cases. Further difficulties with DELETEs and so on and so on.

So how do you unit test your stored procedures? Is there a treshold in complexity where it gets completely hopeless? What resources are needed for maintenance?

EDIT One more small question, based on AlexKuznetsov's answer: Or is there a treshold under which it is completely useless?

Best Answer

We've been doing this for almost five years, and we think that explicitly testing modifications is definitely doable, but it is quite slow. Besides, we cannot easily run such tests concurrently from several connections, unless we use separate databases. Instead, we should test modfications implicitly - we use them to build up at least some of the test data, and verify that our selects return expected results.

I've written an article entitled Close Those Loopholes: Lessons learned from Unit Testing T-SQL, as well as some blog posts

Regarding your question "Is there a treshold in complexity where it gets completely hopeless?", complex modules need tests much more than simple ones.

To simplify maintenance, we generate expected results, and we store them in separate files - that makes a huge difference.

Related Solutions

Db2 – Stored Procedures Best Practices: Fenced or Unfenced

To be more precise, NOT FENCED routines run in the same process space as the database manager itself. The engine is written in C, so calling a not fenced routine is just like calling another C function from main(). This is where all the memory corruption and performance aspects come from: a not fenced routine has access to all the same resources -- memory, files, etc. -- as the database manager process itself.

For a FENCED routine the database manager launches a separate process (db2fmp), which in turn executes the routine code. As a result the operating system protection prevents a fenced routine to access any memory area or resource that belongs to the database manager.

SQL routines cannot be fenced, strictly speaking, because they don't "run", but they are even better than not fenced -- they are bytecode that the DB2 runtime engine itself executes, so there's no separate thread or process.

C and C++ routines can be fenced, in which case they execute in separate processes, or not fenced, in which case they are loaded into the database manager process space and called as functions.

Java routines can only be fenced by the fact that they need a separate process, Java virtual machine, to execute. If you declare them as NOT FENCED, the option is quietly ignored.

Having said all that, your only choice between fenced and not fenced is with C/C++ routines. Typically you would run the fenced for safety, changing to the not fenced mode only when you are pretty sure they can't harm the database manager and they need higher performance¹.

1 - Performance difference between fenced and not fenced routines comes from the overhead associated with allocating a fenced process, as well as with interprocess communications between the database manager and the not fenced routine. Even then, the fenced process is not created every time a fenced routine is called; a pool of them will be created and reused for such calls. All this means that you might see the benefit of declaring a routine as fenced only if that routine is called very frequently (as in, tens of times per second or more).

Mysql – Query optimization ndbcluster + order by

Right, I've found the answer to my own question, and thought I'd post it here in case anyone else would ever have the same basic question:

Given that a cluster shard the data across numerous nodes, there is no real guarantee that an AUTO_INCREMENT primary key will reflect the order in which records were added. Especially when dealing with multiple master replication.
Details can be found here

In essence, the answer to my question boils down to this:
Each node is given a range of available increment-values for the next N inserts. However, you have no real say in which node will insert the next row, and thus you can't be absolutely sure that the next row to be inserted will receive the MAX(pk_id_val) + 1 as a PK value.
A simple diagram to clarify:

 +------------+      +------------+
 |   Node 1   | <==> |   Node 2   |
 +------------+      +------------+
 |   PK 1-3   |      |   PK 4-6   | (Range of available increment vals)
 +------------+      +------------+
 INSERT INTO cluster_tbl (field) VALUES ('test');
 INSERT INTO cluster_tbl (field) VALUES ('test');
 INSERT INTO cluster_tbl (field) VALUES ('test');

If all three insert queries are sent to Node 1, then all is fine and dandy, and the PK's will be 1, 2 and 3 respectively. However, if the first query is sent to node 1, the second to Node 2, and the third back to node 1 again, then the PK's will be 1, 4 and 2 respectively.
In which chase a SELECT * FROM cluster_tbl ORDER BY pk_field DESC LIMIT 0,1; query will serve up the second row that was inserted, and not the third.

A possible solution: create an index on an inserted field, defined as follows:

`insert` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP

And order the query by descending timestamp values:

SELECT *
FROM cluster_tbl
ORDER BY insert DESC
LIMIT 0,1;

That, for me at least, did the trick

Best Answer

Related Solutions

Db2 – Stored Procedures Best Practices: Fenced or Unfenced

Mysql – Query optimization ndbcluster + order by

Related Question