PostgreSQL Config Storage – Efficient Methods

database-designdynamic-sqlperformanceplpgsqlpostgresql

The problem is to store and use global configs such as currency/date format etc. Since this functions are commonly called I'm concerned with the performance impact.
Lets take currency as an example. We store currency as numeric, but when displayed we want it to be of a certain format, say to_char(_numeric_value, 'FM999G999G999G999G990D00').

The obvious solution is to create a table that stores this format and uses is whenever needed. But since this is very commonly called function we don't want it to be slow.
The solution I came up with is to re-compile format function each time config is changed. I use on-update trigger for that and also store function source-code. This way performance should be the same as if the format was hardcoded, but it doesn't seem to perform as expected.

What are down sides of this solution, am I missing something? Maybe there's an even better solution?

_{sqlfiddle is down for me currently, so I will post all necessary DDL/DML here.}

CREATE TABLE test_config(money_format text);
CREATE TABLE test_config_fncs(fnc_name text, fnc_source TEXT);

--This is function for performance comparison.
CREATE OR REPLACE FUNCTION test_fnc_common_format_money_no_precompile(_value numeric)
    RETURNS text AS
$BODY$
BEGIN
    RETURN to_char(_value, money_format)FROM test_config;
END;
$BODY$
    LANGUAGE plpgsql STRICT STABLE;

CREATE OR REPLACE FUNCTION test_tfnc_config_fnc_precompiler()
    RETURNS trigger AS
$BODY$
BEGIN
    IF (NOT (NEW.money_format = OLD.money_format)) THEN
        EXECUTE 
        (
            SELECT 
                    replace(fnc_source, '>money_mask<', NEW.money_format)
                FROM test_config_fncs 
                WHERE fnc_name='test_fnc_common_format_money'
        );
    END IF;
    RETURN NEW;
END;
$BODY$
    LANGUAGE plpgsql;

CREATE TRIGGER test_tg_config_aiu_fnc_precompile
    AFTER UPDATE
    ON test_config
    FOR EACH ROW
    EXECUTE PROCEDURE test_tfnc_config_fnc_precompiler();
-- DDL DONE
-- INITIAL INSERTS:
INSERT INTO test_config(money_format, date_format) values (null);

INSERT INTO test_config_fncs(fnc_name, fnc_source) VALUES ('test_fnc_common_format_money',
$fnc$
CREATE OR REPLACE FUNCTION test_fnc_common_format_money(_value numeric)
    RETURNS text AS
$BODY$
BEGIN
    RETURN to_char(_value, '>money_mask<');
END;
$BODY$
    LANGUAGE plpgsql STRICT IMMUTABLE;
$fnc$);

--TEST QUERIES:
UPDATE test_config SET money_format='FM999G999G999G999G990D00';

SELECT * FROM test_fnc_common_format_money(11111.2222); -- runtime 1700ms per 1 million queries
SELECT * FROM test_fnc_common_format_money_no_precompile(11111.2222); -- 13100ms per 1 million (!)

Best Answer

Your code looks mature, mostly, but see below! Once you fixed that, I don't see much that could go wrong with it. I share the doubts that it will improve performance much, though. And there may be better alternatives, depending on exact requirements.

General advice

For simple functions as displayed:

Use simple SQL function (LANGUAGE sql) which can be inlined in the context of an outer query. SQL is preferable for simple functions that are typically nested.
- PostgreSQL Stored Procedure Performance
Remove the ~~STRICT~~ modifier. Contrary to what you might expect, this is probably not going to help with simple SQL functions, because it can prevent said inlining. (STRICT may help with expensive functions, though.)
- Function executes faster without STRICT modifier?

Multiple users

To accommodate multiple users with different but more or less stable preferences, you could improve your design building on the typical default search_path setting serach_path = "$user",public.
Create a separate version for each user with the same function name in their respective private schemas. Unless schema-qualified, each user will see his own version of the function fnc_common_format_money(). And they can all use the same queries.

Switch schema

Or, if we are dealing with changing requirements for a single user, you could switch the search_path manually (and an arsenal of functions with it):

SET search_path = "$user", verbose_format, public

verbose_format being one name of a schema with specialized functions. Could also come before "$user".
Related:

How can I fake inet_client_addr() for unit tests in PostgreSQL?

You may not need a regime for dynamic settings any more. If you still do, extend your table test_config with a column username (or schemaname) to store a separate row for each user (schema) and adapt your code accordingly. Make it a bit simpler and safer while being at it:

Code suggestions

For the "multiple users" scenario.

CREATE OR REPLACE FUNCTION fnc_test_config_precompiler()
  RETURNS trigger AS
$func$
BEGIN
   EXECUTE (
      SELECT format(fnc_source, NEW.username, NEW.money_format)
      FROM   test_config_fncs 
      WHERE  fnc_name = 'test_fnc_common_format_money'
      );

   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER test_config_precompile
AFTER INSERT OR UPDATE OF username, money_format      -- !!
ON test_config
FOR EACH ROW EXECUTE PROCEDURE fnc_test_config_precompiler();

...

INSERT INTO test_config_fncs(fnc_name, fnc_source)
VALUES ('fnc_common_format_money',
$fnc$
CREATE OR REPLACE FUNCTION %$1I.fnc_common_format_money(_value numeric)
  RETURNS text LANGUAGE sql IMMUTABLE AS
'SELECT to_char(_value, %$2L)';
$fnc$);

Major points

Build one function per user setting. (Optional, but the rest of the advice applies in any case.)
Build conditions into the trigger directly and call trigger function only after relevant columns have been changed or on INSERT (this way you do not have a reference to OLD in the trigger function, which would not work with the additional call ON INSERT. The simple trigger only creates required functions and does not take care of orphaned versions.
Parameter name inside the body of an SQL function requires pg 9.2+.
Use format() to inject (sanitized!) parameters into the CREATE FUNCTION statement. Closing a possible security issue with SQL injection.

Related Solutions

Alternatives to concatenating strings or going procedural to prevent SQL query code repetition

You are too modest - your SQL is well and concisely written given the task you are undertaking. A few pointers:

t1.name <> t2.name is always true if t1.name = REPLACE(t2.name, 'DUP_', '') - you can drop the former
usually you want union all. union means union all then drop duplicates. It might make no difference in this case but always using union all is a good habit unless you explicitly want to drop any duplicates.

if you are willing for the numeric comparisons to happen after casting to varchar, the following might be worth considering:

create view test_attribs_cast as 
select id, name, attr1, attr2, cast(attr3 as varchar(2000)) as attr3, 
       cast(attr4 as varchar(2000)) as attr4, attr5
from test_attribs;

create view test_attribs_unpivot as 
select id, name, 1 as attr#, attr1 as attr from test_attribs_cast union all
select id, name, 2, attr2 from test_attribs_cast union all
select id, name, 3, attr3 from test_attribs_cast union all
select id, name, 4, attr4 from test_attribs_cast union all
select id, name, 5, attr5 from test_attribs_cast;

select 'attr'||t1.attr# as different, t1.id as id_1, t2.id as id_2, t1.name, 
       t2.name as name_dup, t1.attr as val1, t2.attr as val2
from test_attribs_unpivot t1 join test_attribs_unpivot t2 on(
       t1.id<>t2.id and 
       t1.name = replace(t2.name, 'DUP_', '') and 
       t1.attr#=t2.attr# )
where t1.attr<>t2.attr or (t1.attr is null and t2.attr is not null)
      or (t1.attr is not null and t2.attr is null);

the second view is a kind of unpivot operation - if you are on at least 11g you can do this more concisely with the unpivotclause - see here for an example

I'm say don't go down the procedural route if you can do it in SQL, but...
Dynamic SQL is probably worth considering despite the problems you mention with testing and maintenance

--EDIT--

To answer the more general side of the question, there are techniques to reduce repetition in SQL, including:

Views - you know about that one :)
Common Table Expressions (see here for example)
individual features of the database such as decode (see Leigh's answer for how this can reduce repetition), window functions and hierarchical/recursive queries to name but a few

But you can't bring OO ideas into the SQL world directly - in many cases repetition is fine if the query is readable and well-written, and it would be unwise to resort to dynamic SQL (for example) just to avoid repetition.

The final query including Leigh's suggested change and a CTE instead of a view could look something like this:

with t as ( select id, name, attr#, 
                   decode(attr#,1,attr1,2,attr2,3,attr3,4,attr4,attr5) attr
            from test_attribs
                 cross join (select rownum attr# from dual connect by rownum<=5))
select 'attr'||t1.attr# as different, t1.id as id_1, t2.id as id_2, t1.name, 
       t2.name as name_dup, t1.attr as val1, t2.attr as val2
from t t1 join test_attribs_unpivot t2 
               on( t1.id<>t2.id and 
                   t1.name = replace(t2.name, 'DUP_', '') and 
                   t1.attr#=t2.attr# )
where t1.attr<>t2.attr or (t1.attr is null and t2.attr is not null)
      or (t1.attr is not null and t2.attr is null);

Database Design – Implementing a Multilingual User Interface

It is possible to store strings with different collations in the same column using SQL_VARIANT:

CREATE TABLE dbo.Localized
(
    text_id     INTEGER NOT NULL,
    lang_id     INTEGER NOT NULL,
    text_body   SQL_VARIANT NOT NULL,

    CONSTRAINT [PK dbo.Localized text_id, lang_id]
        PRIMARY KEY CLUSTERED (text_id, lang_id),
)
GO
INSERT dbo.Localized
    (text_id, lang_id, text_body)
VALUES
    (1001, 2057, N'Database problems' COLLATE Latin1_General_CI_AS);
GO
INSERT dbo.Localized
    (text_id, lang_id, text_body)
VALUES
    (1001, 1025, N'قاعدة بيانات المشاكل' COLLATE Arabic_CI_AS)

This design has several drawbacks (including being limited to 8000 bytes), not least in the search area: SQL_VARIANT cannot be full-text indexed, and some string comparison features (e.g. LIKE) cannot be used directly either. On the other hand, it is possible to create a regular index on SQL_VARIANT and perform the more basic comparisons (e.g. <, =, >) in a collation-aware fashion:

CREATE UNIQUE INDEX uq1 ON dbo.Localized (text_body)
GO
-- One row
SELECT
    l.*
FROM dbo.Localized AS l 
WHERE
    l.text_body = CONVERT(SQL_VARIANT, N'Database problems' COLLATE Latin1_General_CI_AS)

-- No rows (and no collation error!)
SELECT
    l.*
FROM dbo.Localized AS l
WHERE
    l.text_body = CONVERT(SQL_VARIANT, N'Database problems' COLLATE Arabic_CI_AS)

-- One row, index seek, manual version of "LIKE 'D%'"
SELECT
    l.*
FROM dbo.Localized AS l 
WHERE
    l.text_body >= CONVERT(SQL_VARIANT, N'D' COLLATE Latin1_General_CI_AS)
    AND l.text_body < CONVERT(SQL_VARIANT, N'E' COLLATE Latin1_General_CI_AS)

We can also write the usual sort of procedures:

CREATE PROCEDURE dbo.GetLocalizedString
    @text_id    INTEGER,
    @lang_id    INTEGER,
    @text_body  SQL_VARIANT OUTPUT
AS
BEGIN
    SELECT
        @text_body = l.text_body
    FROM dbo.Localized AS l
    WHERE
        l.text_id = @text_id
        AND l.lang_id = @lang_id
END
GO
DECLARE @text SQL_VARIANT

EXECUTE dbo.GetLocalizedString
    @text_id = 1001,
    @lang_id = 1025,
    @text_body = @text OUTPUT

SELECT @text

Of course, full-text indexing is also problematic in the "single table for all translations" design, since full-text indexing (all but) requires a language id setting per column. The multiple table design described by Joop Eggen could be full-text indexed (though it would naturally require one index per table).

The other main option is to have one column per locale in the base table:

CREATE TABLE dbo.Example
(
    text_id     INTEGER NOT NULL,
    text_2057   NVARCHAR(MAX) COLLATE Latin1_General_CI_AS NULL,
    text_1025   NVARCHAR(MAX) COLLATE Arabic_CI_AS NULL,

    CONSTRAINT [PK dbo.Example text_id]
        PRIMARY KEY CLUSTERED (text_id)
)

This arrangement does have a certain simplicity to it, and works well with full-text indexing, though it does require a new column to be added with each new language, and many developers find this sort of structure inelegant and unsatisfactory to work with.

Each of the alternatives has advantages and disadvantages, and will require indirection at some level or another, so it might depend on where the developers concerned feel happiest locating that indirection. I imagine most people will prefer the multiple-table design for most purposes.