Sql-server – Best way to Design Key,value pairs in relational database

sql server

I have a table which has a list of all attributes that any object can have:

attr_id    attr_name    attr_data_type
------      -----        --------------

I have a table [object_key] in which each object_key_id has an unique list i.e set of attr_id,attr_value pairs.

object_key_id   attr_id   attr_value
---------       -------   ----------
1               1          'value1'
1               2          'value2'
2               1          'value3'
2               2          'value4'
3               1          'value1'
3               2          'value2'
3               3          'value3'

In this case, what is the best way to get the object_key_id for a given set of attr_id,attr_value pairs.

Ex: Input: [{1, 'value1'}, {2, 'value2'}] Output: 1

What's the best way to build a unique index i.e to have unique object_key_id for a set of attr_id,attr_value pairs?

Best Answer

What you want is called (exact) Relational Division.

See also this article: Divided We Stand: The SQL of Relational Division

There are several ways to have a query like that solves this problem. You can load the input data to a (temporary) table or use a CTE like this:

WITH input_data (attr_id, attr_value) AS
  ( SELECT *
    FROM 
      ( VALUES                              -- the input data. 
          (1, 'value1'), 
          (2, 'value2')
      ) AS i (attr_id, attr_value)
  ),
t AS                                        -- all the object_key_id values.
  ( SELECT DISTINCT object_key_id           -- if you have a separate table with those
    FROM tableX                             -- you can use it, instead. 
  )
SELECT object_key_id
FROM t
WHERE NOT EXISTS                            -- this subquery ensures
      ( SELECT attr_id, attr_value          -- that all input attribute/value pairs
        FROM input_data                     -- appear for an object_key
      EXCEPT
        SELECT attr_id, attr_value
        FROM tableX AS tt
        WHERE tt.object_key_id = t.object_key_id 
      ) 
  AND NOT EXISTS                            -- and this subquery does the reverse
       ( SELECT attr_id, attr_value         -- i.e. the exact part of the division
        FROM tableX AS tt
        WHERE tt.object_key_id = t.object_key_id 
      EXCEPT
        SELECT attr_id, attr_value
        FROM input_data
      ) ;

Test at SQL-Fiddle

This question at StackOverflow: How to filter SQL results in a has-many-through relation has a few more ways to solve it and benchmarks for Postgres but it's not for the exact variation, only for the "Division with Remainder". The queries will be similar but an extra check/condition has to be added for the exact variation.

Related Solutions

Sql-server – Best way to put checksum on all pages

The most comprehensive way in my view would be to encrypt/decrypt the database with TDE. This will ensure that each and every page will change in memory and will be flushed to disk.

I've tried this with success on 'legacy' dbs that were originally created in SQL2000, after I discovered that several pages didn't have actual checksums on them (0x...200) if you look at the header with dbcc page.

If you were to try this, I would recommend testing it on a restored version of the live db, just in case you have undetected corruption that could be caught and stall the encryption process. There are flags to deal with it, but better play it safe.

Obviously you'll want to backup the certificate used by the encryption, so you are covered for any eventuality during the time the db is encrypted.

If anyone has a better idea for writing checksums on all pages, I'd love to hear it :-)

SQL Server – Grouping Rows by Two Columns Without Considering Order

Something like:

SELECT COUNT(*) AS C, V1, V2
FROM   (SELECT CASE WHEN Value1<Value2 THEN Value1 ELSE Value2 END AS V1
             , CASE WHEN Value1<Value2 THEN Value2 ELSE Value1 END AS V2
        FROM   input_table
       ) AS tbl
GROUP BY V1, V2

should do the trick, but may not be terribly efficient. Any filtering clauses should be added to the inner select, not the outer.

SELECT COUNT(*) AS C
     , CASE WHEN Value1<Value2 THEN Value1 ELSE Value2 END AS V1
     , CASE WHEN Value1<Value2 THEN Value2 ELSE Value1 END AS V2
FROM   input_table
GROUP BY CASE WHEN Value1<Value2 THEN Value1 ELSE Value2 END, CASE WHEN Value1<Value2 THEN Value2 ELSE Value1 END

may also work (and may be more efficient) but the code repetition between the GROUP and SELECT clauses may become a maintenance problem.

Of course if you can ensure that the data is always the "right way around" (and this doesn't break your model in other ways - we can't tell if it might as your questions gives no detail on which to base a supposition either way) then

SELECT COUNT(*) AS C, Value1, Value2
FROM   input_table
GROUP BY Value1, Value2

is sufficient. You could enforce the two values being the right way around using INSTEAD OF triggers or in your business logic layer (updating existing data should be easy).

Best Answer

Related Solutions

Sql-server – Best way to put checksum on all pages

SQL Server – Grouping Rows by Two Columns Without Considering Order

Related Question