You'll probably end up with something looking a little like this:
This may be a little more involved than you first expected, but I'll take you through the whys and hows of it all ( doing this is generally much easier than explaining it, by the way ), starting with identifying objects. Using "Noun-Verb" analysis, which is a fairly common OOAD technique, we start picking apart your requirements:
Right away, it becomes evident that at least Employee, Position / Job and Skill are needed. Further along, a need for Department and Skill Group emerges, then finally Skill Level. From here, things like Employee Surname are noted as well.
Some liberties are then taken here ( which is not abnormal in any way, considering the quality of the list of requirements you'd normally be extracting from a client ), but the noun list to be used for the general structure, in no particular order, is as follows:
Department ( D )
Department Code ( DC )
Department Name ( DN )
Position ( P )
Position Code ( PC )
Position Title ( PT )
Skill Group ( G )
Skill Group Code ( GC )
Skill Group Name ( GN )
Skill ( S )
Skill Code ( SC )
Skill Name ( SN )
Employee ( E )
Employee Code ( EC )
Employee First Name ( EF )
Employee Surname ( EL )
Level ( L )
Level Code ( LC )
Level Name ( LN )
Using these nouns, on a relatively simple level, relationships between them can be established using "is-a / has-a," that is, "Noun A is a Noun B" or "Noun A has a Noun B." This is particularly useful for classifying our nouns into two categories, being an "attribute", an atomic entity, later called a column, or a "container," which is a collection, later called a table. During the first round of this "is-a / has-a," the goal is generally to simplify the list, effectively removing the attributes from the picture, grouping them to their containers.
The naming convention used above goes a fairly long way to making this process simple. For instance, an Employee ( E ) has an Employee First Name ( EF ) and an Employee Last Name ( EL ), forming one-to-one relationships between Employee ( E ) and the Employee First Name ( EF ) and Employee Last Name ( EL ) entities. In this way, Employee ( E ) can be classified as a container, whereas Employee First Name ( EF ) and Employee Last Name ( EL ) become attributes. Focusing on this reduction of nouns via one-to-one relationships, we can reduce the list above to the following, where the notation ( A: B, C )
means that entity A
is a "container" with "attributes" B
and C
. It follows that:
Department ( D: DC, DN )
Position ( P: PC, PT )
Skill Group ( G: GC, GN )
Skill ( S: SC, SN )
Employee ( E: EC, EF, EL )
Level ( L: LC, LN )
With this reduction, the one-to-many relationships can now be considered. A Department ( D ) may have any ( a pluralized has a ) number of Positions ( P ) available, implicitly indicating a one-to-many relationship. Extending the above notation, in addition to the attribute list, we can now add "compound" attributes. More simply, a compound attribute is basically a container treated as an attribute. This can be written as ( P: PC, PT, ( D: DC, DN ) )
, which begins to highlight the places where foreign keys will materialize. Similarly, Skill Group ( G ) may refer to a number of differing Skills ( S ).
( P: PC, PT, ( D: DC, DN ) )
( S: SC, SN, ( G: GC, GN ) )
The many-to-many relationships can then be addressed. Position ( P ) may require any number of Skills ( S ), each with it's own Skill Level ( L ). Conversely, a specific Skill ( S ) can be required for any number of Positions ( P ), at any particular Skill Level ( L ). These associations manifest as a ( is a ) Position Skill ( PS ), effectively a "surrogate" container, having a compound attribute of Skill Level ( L ). To use the phrasing above, each of these newly established Position Skills ( PS ) has a required Skill Level ( L ). To maintain this many-to-one relationship between Position Skill ( PS ) and Level ( L ), a unique key is to be placed on the foreign keys Position ( P ) and Skill ( S ).
( PS: ( { U1 }( P: PC, PT, ( D: DC, DN ) ), { U1 }( S: SC, SN, ( G: GC, GN ) ), ( L: LC, LN ) )
A quick note: The { Un } is used to indicate both attributes ( themselves containers ) are part of a single unique key. If the attributes were independently unique, we could use { U1 } and { U2 }.
Next, an Employee ( E ) can occupy any number of Positions ( P ), indicating that there is a Employee Position noun to be added to our list. This particular relationship manifests over some period of time, thus having ( has a ) Start Date ( T1 ) and End Date ( T2 ). However, there is no way to assert that a normally part-time Employee ( E ) could not be serving under two separate Positions ( P ) in a full-time manner simultaneously in the interim before a new hire, nor that a previous Employee ( E ) could be re-hired or otherwise return to a Position ( P ) previously held by that Employee ( E ). This dilemma results in a true many-to-many relationship, with Start Date ( T1 ) and End Date ( T2 ) as attributes. As Start Date ( T1 ) and End Date ( T2 ) are not functionally dependent on anything other than the relationship between Employee ( E ) and Position ( P ), this structure is not in violation of 3NF.
( EP: ( E: EC, EF, EL ), ( P: PC, PT, ( D: DC, DN ) ), T1, T2 )
The final piece comes in the form of the relationship between an Employee's ( E ) specific term of employment, Employee Position ( EP ), the relevant Position Skills ( PS ) and the quality at which those specific Position Skills ( PS ) are realized, or simply, the achieved Level ( L ). Much like the relationship established between Position ( P ) and Skill ( S ), there is a Employee Position Position Skill ( EPPS ) which has a compound attribute of Level ( L ) in a many-to-one relationship. This again requires a unique key, leaving our noun list ( now expanded ) in the following state:
( EPPS:
{ U2 }( EP:
( E: EC, EF, EL ),
( P: PC, PT, ( D:
DC, DN ) ),
T1, T2 ),
{ U2 }( PS:
( { U1 }( P:
PC, PT, ( D:
DC, DN ) ),
{ U1 }( S:
SC, SN, ( G:
GC, GN ) ),
( L:
LC, LN ) ),
( L:
LC, LN ) )
That, of course, is a bit on the crazy side to read, so we extract the definitions we've reduced the problem to in an actionable way. Here, I've added additional unique keys simply for the purpose of completeness. Note that there are no soon-to-be tables which contain more than one independently unique key, so { U1 } is used throughout, per table:
( D: { U1 }DC, DN )
( P: { U1 }D, { U1 }PC, PN )
( G: { U1 }GC, GN )
( S: { U1 }G, { U1 }SC, SN )
( L: { U1 }LC, LN )
( PS: { U1 }P, { U1 }S, L )
( E: { U1 }EC, EF, EL )
( EP: { U1 }E, { U1 }P, T1, T2 }
( EPPS: { U1 }EP, { U1 }PS, L }
Why?
The expressed requirement that artist
and title
must match in the same element of the JSON array is not reflected in your query, which finds all rows where at least one element matches the artist
and another (possibly a different one) matches the title
.
The example data for your first case was inconclusive, since the query cannot fail this way for a single array element. Your second example demonstrates the case well, though.
Solution in Postgres 9.3
There are various ways to fix this. One way would be to translate each json array element to an SQL array of composite type consisting of artist
and title
and match the whole type as one.
Another way would be to keep indexes like you have now (or even a single composite spanning both expressions: (json2arr(data, 'artist'), json2arr(data, 'title'))
. Your current query identifies possible matches. Unnest the json array data
for all identified candidates and check whether both artist
and title
match on the same element. May or may not be efficient enough.
Better use jsonb
in n Postgres 9.4
I am not going into detail for json
/ pg 9.3, because the release of jsonb
in Postgres 9.4 with advanced indexing capabilities mostly obsoleted the problem. This can be implemented much simpler and more efficient with a native GIN index on a jsonb
column. There are various options. To optimize for the presented case:
CREATE TABLE tracks (id serial, data jsonb);
INSERT INTO tracks (id, data) VALUES
(1, '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]')
, (2, '[{"artist": "Another Artist", "title": "Welcome to My Life"},
{"artist": "Simple Plan", "title": "Perfect"}]');
Index:
CREATE INDEX tracks_data_gin_idx ON tracks USING gin (data jsonb_path_ops);
Query:
SELECT * FROM tracks
WHERE data @> '[{"artist": "Simple Plan", "title": "Welcome to My Life"}]';
Sequence of attributes in the JSON value and insignificant whitespace don't matter for jsonb
. I added detailed information for jsonb to the referenced answer on SO:
Best Answer
I recommend not to store the data as JSON in the database, particularly if you want to query them with SQL. Storing the data will be easy that way, but your queries will be much more complicated and slower.
If you want to use several SQL statements in client code or a function in the database should mostly depend on the question if you want to keep your business logic in the application or in the database.
There shouldn't be too many round trips necessary if you use
RETURNING
as inYou can even insert several rows with a construct like