SQL Server CLR – Where to Store Config Values

configurationsql serversql-clrsql-server-2012

I am looking to replace some hard-coded values in CLR code (SQL 2012), I came across the use of config files here – https://stackoverflow.com/questions/16177769/assembly-configuration-clr-stored-procedure. A couple responses said it is better to save key/value pairs in tables as opposed to config files, one reason being that it wasn't supported in SQL 2014+.

Our developer says it works fine in SQL 2014, so I'm not concerned with that, but I do think storing the values in a table on the local database would be a better idea.

Can anyone else confirm that config files are or are not supported in SQL 2014+?
Is there any documentation on best practices for storing key/value pairs in a table, or any reason not to?

Best Answer

I am pretty sure that the sqlservr.exe.config config file works in SQL Server 2014, but will take me a few minutes to confirm. I am also pretty sure that it is not "officially" supported, and so not "recommended" to use, but on occasion I find it handy to use and have done so.

Whether or not the location of the data should be in a config file or table depends largely on what the overall needs are. How often will the values change? You say "hard-coded", but lookup values can change per each release, sometimes. Do you want to have to run an UPDATE statement, or find a way to include maintaining this config file in your build process (you need to find some way to ensure that it is always there).

Also, will the values ever be needed outside of the .NET / SQLCLR code? Having them in a table makes them more available to all existing queries.

Also, will there ever be a need for separation between databases for these values? With a config file it is shared across all App Domains for that Instance. A config table can just as easily be placed in a common Database (assuming multiple databases might want the values) or placed into each Database. You can have separation of values within the config file, but that requires having different names for the keys, and that makes it harder to standardize within the .NET code.

If it makes sense enough to be read-only and common / shared, then why not start out trying the config file approach, and if you find any reason that it doesn't meet your needs, just move it over to being stored in a table. After all, neither option is going to require that many hours of work, and the time difference between them is negligible, so it won't cost much if you do need to switch to the other approach.

NOTES:

In the StackOverflow question linked in this question, the answer from @David states that this config file is no longer working as of SQL Server 2014. I have just tested and confirmed that it does indeed work on SQL Server 2014 and even on 2016!
In the StackOverflow question linked in this question, the answer from @Remus states that storing in the DB is preferred due to the config file being disconnected from SQL Server and possibly not on related servers, or possibly out-of-synch with them. This point should not be ignored, and those concerns should help the team look at the big picture of how the application lives and hence come to a more informed decision. However, there are certainly scenarios that do not have such concerns and which allow for the build process to ensure that this config file is in place in each environment, just as that build process does the same checks for config files on web servers, etc.
In the blog referenced in @Remus's answer in the linked question, the author mentions an "issue" with getting errors attempting to read AppSettings, and a work-around is noted to call ConfigurationManager.ConnectionStrings.Count;. That is correct. However, you don't need to make that call each time; it only needs to be done once for the life of the App Domain. So, it works just fine to place it in a static class constructor as follows:
```
static _your_class_name_here_()
{
    int _BugBeGone = ConfigurationManager.ConnectionStrings.Count;
}
```
And no, you do not need to have a <connectionStrings> section in your config file for the above to work.
That same blog post also mentions that the config file is read only once, when the App Domain is created and that you will need to restart the instance -- or possibly unload the App Domain -- in order to have any changes made to the config file after the App Domain is created become visible to your code. The first part is correct: the config file is loaded once and cached for the life of the App Domain. However, you do not need to restart the Instance in order to refresh the config values, nor even unload the app domain. You just need to call:
```
ConfigurationManager.RefreshSection("appSettings");
```
But don't do that for every call, unless the function or stored procedure is meant to run occasionally. If this is something that will run at least once per second, or more often, then create a separate function or stored procedure to do just that refresh and execute it when necessary.

Related Solutions

How HierarchyID Type Works When CLR is Disabled in SQL Server

The clr enabled server configuration option only controls whether user assemblies can be run by the SQL Server instance.

The hierarchyid, geometry and geography types are system CLR types. These are contained in system assemblies, so are available regardless of the clr enabled setting.

Similarly, other system features that rely on CLR integration, like FORMAT (SQL Server 2012 onward) are available even if clr enabled is off.

If you need to manipulate hierarchyid data in a custom CLR routine, you will need to enable user clr assemblies via this setting.

Sql-server – Reasons for avoiding large ID values

Without seeing code, it is pretty hard to say conclusively what is happening. Although, most likely the IDENTITY value is being cached, causing gaps in the value after SQL Server is restarted. See https://stackoverflow.com/questions/17587094/identity-column-value-suddenly-jumps-to-1001-in-sql-server for some good answers and info about that.

A simple INT field can hold values up to 2,147,483,647. You can actually start the identity value at -2,147,483,648, giving a full 32 bits of values. 4 Billion distinct values. I doubt very much you're going to run out of values to use. Assuming your application is consuming 1,000 values for each actual row added, you'd need to be creating nearly 12,000 rows per day every day to run out of IDs in 6 months assuming you started the IDENTITY value at 0, and were using an INT. If you were using a BIGINT, you would have to wait 21 million centuries before you ran out of values if you wrote 12,000 rows per day, consuming 1,000 "values" per row.

Having said all that, if you wanted to use BIGINT as the identity field data type, there is certainly nothing wrong with that. That'll give you for all intents-and-purposes, a limitless supply of values to use. The performance difference between an INT and a BIGINT is practically non-existent on modern 64-bit hardware, and highly preferable over for-instance using NEWID() to generate GUIDs.

If you wanted to manage your own values for the ID column, you could create a key table, and provide a pretty bulletproof way of doing that using one of the methods shown in the answers on this question: Handling concurrent access to a key table without deadlocks in SQL Server

The other option, assuming you're using SQL Server 2012+, would be to use a SEQUENCE object to get ID values for the column. However, you'd need to configure the sequence to not cache values. For example:

CREATE SEQUENCE dbo.MySequence AS INT START WITH -2147483648 INCREMENT BY 1 NO CACHE;

In answer to your boss' negative perception of "high" numbers, I would say what difference does it make? Assuming you use an INT field, with an IDENTITY, you could in fact start the IDENTITY at 2147483647 and "increment" the value by -1. This would make absolutely no difference to the memory consumption, performance, or disk space used since a 32 bit number is 4 bytes, no matter if it is 0 or 2147483647. 0 in binary is 00000000000000000000000000000000 when stored in a 32-bit signed INT field. 2147483647 is 01111111111111111111111111111111 - both numbers take precisely the same amount of space, both in memory, and on disk, and both require precisely the same amount of CPU operations to process. It is far more important to get your application code designed correctly than to obsess about the actual number stored in a key field.

You asked about the pros and cons of either (a) using a larger-capacity ID column, such as a BIGINT, or (b) rolling your own solution to prevent ID gaps. To answer these concerns:

BIGINT instead of INT as the data-type for the column in question. Using a BIGINT requires double the amount of storage, both on-disk, and in-memory for the column itself. If the column is the primary key index for the table involved, each and every non-clustered index attached to the table will also store the BIGINT value, at twice the size of an INT, again both in-memory and on-disk. SQL Server stores data on disk in 8KB pages, where the number of "rows" per "page" depends on the "width" of each row. So, for instance, if you have a table with 10 columns, each one an INT, you'd be approximately able to store 160 rows per page. If those columns where instead BIGINT columns, you'd only be able to store 80 rows per page. For a table with a very large number of rows, this clearly means I/O required to read and write the table will be double in this example for any given number of rows. Granted, this is a pretty extreme example - if you had a row consisting of a single INT or BIGINT column and a single NCHAR(4000) column, you'd be (simplistically) getting a single row per page, whether you used an INT or a BIGINT. In this scenario, it would not make much appreciable difference.
Rolling your own scenario to prevent gaps in the ID column. You'd need to write your code in such a way that determining the "next" ID value to use does not conflict with other actions happening to the table. Something along the lines of SELECT TOP(1) [ID] FROM [schema].[table] naively comes to mind. What if there are multiple actors attempting to write new rows to the table simultaneously? Two actors could easily obtain the same value, resulting in a write-conflict. Getting around this problem requires serializing access to the table, reducing performance. There have been many articles written about this problem; I'll leave it to the reader to perform a search on that topic.

The conclusion here is: you need to understand your requirements and properly estimate both the number of rows, and the row width, along with concurrency requirements of your application. As usual, It Depends™.

Best Answer

Related Solutions

How HierarchyID Type Works When CLR is Disabled in SQL Server

Sql-server – Reasons for avoiding large ID values

Related Question