SQL Server – Are SQL Unit Tests Supposed to Be Long

best practicessql servert-sqltsqltunit test

I am writing stored procedures with some non-trivial business logic. I am trying to unit test them, but the actual tests end up being quite long (shorter ones starting at 40-50 LoCs, using commonly 4 different tables), which doesn't seem very "unit". (Admittedly, I format my code in a way where it takes a lot of space.)

In context of "normal" programming languages I've heard the advice to refactor the complex procedure into smaller chunks. But I don't want to do that here because:

I don't want to pollute "global namespace" by small routines called from one place only.
Passing around tables from and to stored procedures is cumbersome.
Custom functions can have negative effects on performance.

Am I wrong about this reasoning?
I am new to unit testing, so perhaps I am just writing my tests wrong?
Is SQL longwinded language and thus it's unit tests are longer as well?

(I am using SQL Server with tSQLt framework, but I believe the question is system-agnostic.)

Best Answer

Am I wrong about this reasoning? I am new to unit testing, so perhaps I am just writing my tests wrong? Is SQL longwinded language and thus it's unit tests are longer as well?

I think this is a great (though somewhat opinion-based) question. The past few years I've attempted to implement tSQLt unit testing in a personal OSS project (https://dba-multitool.org/) to attempt to answer it via real world application. Unit testing for any type of SQL definitely breaks from classic norms and patterns. My tests definitely don't feel like unit tests and are usually more complex/longer than I'd prefer, but they have saved me a bunch of times from introducing breaking changes.

The stored procedures in my example are quite complex and long, so I came to a similar impasse as you: should I suck it up and deal with very complex "unit" tests, or fundamentally modify how the stored procedure works so that it is more amicable to unit testing in general? As with many database questions, the ultimate answer, I believe, is it depends. Performance and maintainability take a front seat here, in my experience.

I found that I wasn't able to justify largely changing the complex stored procedures to benefit from unit testing - adding more objects, as you noted, can increase complexity and in this case I was wary of asking end users to house a bunch of single-use objects in their database instead of offering an all-in-one utility stored procedure.

One technique that I did come up with was using optional parameters to induce test cases better. For example, I have parameters @IsExpress, @SqlMajorVersion, and @SqlMinorVersion that end users should not use (they are fetched at runtime if not passed) but that my tests utilize to make sure I'm catching errors I want around unsupported versions. Adding a lot of errors will increase the overall reliability of the code without having to necessarily explicitly test the more complex logic areas that may be hard/not reasonable to do. This marginally affects the stored procedure's length and complexity, but lets me easily isolate functionality and reliably test for it. You could take this further and use parameters to trigger select portions of the stored procedure that need testing the most, or to achieve better code coverage percentages by avoiding a messy area that isn't worth covering or is out of scope.

With this approach I was able to get a fairly high code coverage percentage (98%+) and while I definitely don't test all functionality, I know that almost all of the code gets executed, errors when I need it to, and will parse on execution.

To close with a quote, Wikipedia has unit testing defined as (emphasis mine):

In computer programming, unit testing is a software testing method by which individual units of source code—sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures—are tested to determine whether they are fit for use.

I won't say that Wikipedia (or the book being quoted by Wikipedia) is the end-all for technical definitions, but I do like this definition since it focuses on the end goal of unit testing, which I think tSQLt can reasonably achieve.

Besides, some unit tests are better than none and having lines of code that aren't covered by unit testing defined may be very valuable in the future. To expand on the latter:

In 3 years, you've left the company and Employee #948 is trying to address a bug in the stored procedure. They look and see that only lines 50-65 out of 300 aren't covered by a unit test, otherwise all other functionality has been tested. They now have a pretty good shot at spending time on a likely problematic area rather than reverse engineering the entire 300 lines. Or, the next intern/new hire can be asked to fill in missing unit test coverage gaps as a learning exercise. Knowing what you aren't testing yet can help prioritize future work and diagnose problems faster.

Related Solutions

Unit testing of stored procedures

We've been doing this for almost five years, and we think that explicitly testing modifications is definitely doable, but it is quite slow. Besides, we cannot easily run such tests concurrently from several connections, unless we use separate databases. Instead, we should test modfications implicitly - we use them to build up at least some of the test data, and verify that our selects return expected results.

I've written an article entitled Close Those Loopholes: Lessons learned from Unit Testing T-SQL, as well as some blog posts

Regarding your question "Is there a treshold in complexity where it gets completely hopeless?", complex modules need tests much more than simple ones.

To simplify maintenance, we generate expected results, and we store them in separate files - that makes a huge difference.

PostgreSQL – How to Fake inet_client_addr() for Unit Tests

inet_client_addr() is a system information function.
It is located in the schema pg_catalog like other built-in functions (except for additional modules).

pg_catalog is automatically part of the search_path. Per documentation:

In addition to public and user-created schemas, each database contains a pg_catalog schema, which contains the system tables and all the built-in data types, functions, and operators. pg_catalog is always effectively part of the search path. If it is not named explicitly in the path then it is implicitly searched before searching the path's schemas. This ensures that built-in names will always be findable. However, you can explicitly place pg_catalog at the end of your search path if you prefer to have user-defined names override built-in names.

Bold emphasis mine.
So we create a dedicated schema and place it before pg_catalog in the search_path:

CREATE SCHEMA override;

CREATE OR REPLACE FUNCTION override.inet_client_addr()
  RETURNS inet AS
$func$
SELECT '127.0.0.1'::inet
$func$ language sql STABLE;

SET search_path = override, pg_catalog, public;

Then your call finds your custom override-function first:

SELECT inet_client_addr();

SQL Fiddle

Make sure, unprivileged users cannot create objects in the override schema, or they can play all kinds of tricks on you. That's not the case by default. Per documentation:

No privileges are granted to PUBLIC by default on tables, columns, schemas or tablespaces.

Bold emphasis mine.

Care is needed if the same user should be allowed to create objects in the database.
Per documentation:

The first schema named in the search path is called the current schema. Aside from being the first schema searched, it is also the schema in which new tables will be created if the CREATE TABLE command does not specify a schema name.

Always specify a schema name for CREATE commands, and disallow crating object to all by default to rule out mistakes.

Best Answer

Related Solutions

Unit testing of stored procedures

PostgreSQL – How to Fake inet_client_addr() for Unit Tests

Related Question