I partially agree with Aaron's comment - in the most general case for storing 21 unrelated pieces of information, you'd probably use 21 bit columns. As a general solution, it may well be your best solution. If you had multiple bitmask-ed varchar columns, that would translate to a row with possibly over a hundred bit flags. FYI, 21 bits get stored as 3 bytes when you don't define them as NULLable, removing the necessity for space in the NULL bitmap. Since you have multiple bitmask columns, you'd end up with every 8 bits mashed into a byte.
What SQL Server ends up doing with your multi-column queries is eventually a bunch of bitmasking routines (yes! SQL Server uses bitmasks, so they the concept per se can't be all bad!) but for average use cases, it makes life easier for you.
If we had more information about what types of queries you run, we may be able to better advise, because ultimately the use cases dictate the design.
If you persist with the COMPUTED column, I would persist and index it if you haven't already. It helps some queries, such as
exact matches
WHERE computedInt = POWER(2, 6) -- bit position 7
AND matching on 15th bit and OR matching on 2 other bits (10th and 7th)
WHERE computedInt >= Power(2,14) AND computedInt < Power(2,15)
AND computedInt & (Power(2,9) + Power(2,6)) > 0
But these are probably exotic samples and yet also real live in some cases. It's certainly not too much worse than 21 individual bit columns, for which yes your statements could be easier to write, but remember that SQL Server has mashed them for storage into 3 bytes and will be doing the bit-unmasking anyway! You would have thought if bit-masking were all bad (without exception) then SQL Server wouldn't be doing it, right?
EDIT
Re the scenario of
Four flags, HasHouse,HasCar,HasCat,HasDog, 0000 is has none, 1111 is has all.
it is more efficient and logically expedient to test all 4 bits at once and do a single integer based operation, e.g.
WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) = 0 -- has none
WHERE computedInt & (POWER(2,10)+POWER(2,5)+POWER(2,3)+POWER(2,1)) > 0 -- has one or more
Hypothetically, if this were your most exercised query on the table, you might even group the four columns into another computed column and index it separately, making the bitmask unnecessary (just test the resultant int with =0
and >0
). You might even go further and just precompute the answer... horses for courses.
You did not mention whether this was an OLTP or Decision Support solution; lets assume OLTP.
Basically, you are asking about superclass (the set of common attributes) and subclass (the set of different attributes) by some classification.
In your example you have indicated that the sub-class can have many values. When the number of sub-classes are known and static, you could model it as follows:
The DDL for the User table is as follows:
CREATE TABLE User (
UserId INTEGER DEFAULT nextval('UserId_seq') NOT NULL,
UserName CHARACTER VARYING(40) NOT NULL,
UserLname CHARACTER VARYING(256),
UserType CHARACTER VARYING(10),
CONSTRAINT PK_User PRIMARY KEY (UserId));
The UserType attribute is new and is used to identify which subclass you are storing for a given user.
For your CS Subclass example
insert into user (UserName, UserLname, UserType)
values ('Ali', 'Alipour', 'CS');
-- > Assume that the assigned UserId is = 1 for the purposes of this example
insert into cs_user_projects (user_id, projectName)
values (1, 'p1');
insert into cs_user_projects (user_id, projectName)
values (1, 'p3');
insert into cs_user_projects (user_id, projectName)
values (1, 'p6');
...
Now the application would know that it had to go to the CS_USER tables to get the information for CS users as follows:
select UserName
, UserLname
, ProjectName
from user u
join cs_user_projects p
on u.userid = p.userid
where u.UserType = 'CS';
If you want multiple sub-class information (User Programming Skills), you have to join to that sub-class table as well.
This is a straight forward, simple design; however a potential drawback is that it is a static design. If you wish to represent another sub-class (Engineers, Nurses, ...); then the sub-class tables must be added and the application changed to handle the new sub-class use cases.
In order to allow the model to take on new sub-classes without a change in the data model; you must think of the sub-class as a data object and model the relationship in the following way:
The User table (super-class) stays the same and you still have a classification described by UserType; but now you have an Attrib table to hold the values for each sub-class of user.
As an attribute can belong to many users and a user can have many attributes, the relationship between users and attributes is called a many-to-many relationship. This is resolved by addition of the UserAttrib table with primary keys from each parent table. The Attribute Role is provided for context of the relationship. Let's see how this would load for your CS example again:
-- identical to before
insert into user (UserName, UserLname, UserType)
values ('Ali', 'Alipour', 'CS');
-- Load up the different projects
insert into Attrib ('p1');
insert into Attrib ('p2');
insert into Attrib ('p3');
insert into Attrib ('p4');
insert into Attrib ('p5');
insert into Attrib ('p6');
-- Assign only the appropriate projects to Ali
insert into userattrib( AttribValue, Userid, AttribRole)
values ('p1',1,'PROJECT');
insert into userattrib( AttribValue, Userid, AttribRole)
values ('p3',1,'PROJECT');
insert into userattrib( AttribValue, Userid, AttribRole)
values ('p6',1,'PROJECT');
To query our little data model as before:
select UserName
, UserLname
, AttribValue Project
from user1 u
join UserAttrib a
on u.UserId = a.UserId
where a.AttribRole = 'PROJECT'
and u.UserType = 'CS';
Note that to return multiple attributes (e.g. also return User Programming Skills say), you would have to join to the UserAttrib table again. The benefit in this design is that if a new sub-class comes along (Engineers who specialize in Electrical, Chemical, Mechanical, ... sub-classifications). Then you would simply add data for these classifications and assignments as above without having to change the data-model to do so.
These are the classical, general solutions to super-class/sub-class design for ALL Databases that you mention in your question.
In the Postgres Database case, you should investigate the INHERITS clause of CREATE TABLE.
It allows a table (sub-class) to inherit the parent (super-class) table definition. In addition, you can alter the child (sub-class) tables to add additional attributes to achieve your desired effect of tables which would match on a sub-class by sub-class basis. Note that this implementation method is very similar to the first method we discussed (static, known sub-classes) and would not apply in all design criteria (e.g. dynamic and/or swiftly changing sub-classes).
Best Answer
How about this?
OK, it doesn't copy indexes and such like, but it gets you the basic structure, as requested.
Given that the job of a Data Dictionary is to manage the structures of other tables, I'd say that's a pretty big hole in its functionality!