Sql-server – Using Text Fields As Booleans

application-designdatabase-designsql server

We are developing a product system and I was curious to what industry standards were for many Boolean statements.

In our system we have some, what you could call main, bit fields. E.G. isBackordered, isDiscontinued.

We also have a lot of internal processes and debated on using product tags to denote if this product should be processed that way or not. E.G. isManufacturable, isLaserable

I was wondering if it is better in the industry to just create everything as bit fields or acceptable in the industry to use tags for special circumstances like that.

Best Answer

If you have a system with a large number of tags it's probably more efficient management-wise to use a relational system between products and product-tags.

i.e.:

IF OBJECT_ID(N'dbo.ProductsTags', N'U') IS NOT NULL
DROP TABLE dbo.ProductsTags;
IF OBJECT_ID(N'dbo.Products', N'U') IS NOT NULL
DROP TABLE dbo.Products;
IF OBJECT_ID(N'dbo.Tags', N'U') IS NOT NULL
DROP TABLE dbo.Tags;
CREATE TABLE dbo.Products
(
    ProductID int NOT NULL
        CONSTRAINT PK_Products
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , ProductName varchar(100) NOT NULL
);

CREATE TABLE dbo.Tags
(
    TagID int NOT NULL
        CONSTRAINT PK_Tags
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , TagName varchar(100) NOT NULL
);

CREATE TABLE dbo.ProductsTags
(
    ProductID int NOT NULL
        CONSTRAINT FK_ProductsTags_ProductID
        FOREIGN KEY 
        REFERENCES dbo.Products(ProductID)
    , TagID int NOT NULL
        CONSTRAINT FK_ProductsTags_TagID
        FOREIGN KEY
        REFERENCES dbo.Tags(TagID)
    , CONSTRAINT PK_ProductsTags
        PRIMARY KEY CLUSTERED
        (ProductID, TagID)
);

Insert some data into the various tables:

INSERT INTO dbo.Products (ProductName)
VALUES ('wallet')
    , ('Paint');

INSERT INTO dbo.Tags (TagName)
VALUES ('IsLeather')
    , ('IsPlastic')
    , ('IsBlue');

INSERT INTO dbo.ProductsTags(ProductID, TagID)
VALUES (1, 1)
    , (2, 2)
    , (2, 3);

Query the tables:

SELECT p.ProductName
    , t.TagName
FROM dbo.ProductsTags pt
    INNER JOIN dbo.Products p ON pt.ProductID = p.ProductID
    INNER JOIN dbo.Tags t ON pt.TagID = t.TagID

╔═════════════╦═══════════╗
║ ProductName ║  TagName  ║
╠═════════════╬═══════════╣
║ wallet      ║ IsLeather ║
║ Paint       ║ IsPlastic ║
║ Paint       ║ IsBlue    ║
╚═════════════╩═══════════╝

This allows you to extend the tags at will without needing to redesign the database itself.

Using discrete bit columns in the products table is a strategy that will soon get very tedious. Take for example:

CREATE TABLE dbo.Products
(
    ProductID int NOT NULL
        CONSTRAINT PK_Products
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , ProductName varchar(100) NOT NULL
    , IsLeather bit NOT NULL
    , IsPlastic bit NOT NULL
    , IsBlue bit NOT NULL
);

SELECT p.ProductID
    , p.ProductName
    , p.IsLeather
    , p.IsPlastic
    , p.IsBlue
FROM dbo.Products p

INSERT INTO dbo.Products (ProductName, IsLeather, IsPlastic, IsBlue)
VALUES ('Wallet', 1, 0, 1);

Now, if you want to add products that might be red, you must alter the table via:

ALTER TABLE dbo.Products
ADD IsRed bit NOT NULL;

You must also modify any code that touches the dbo.Products table, to make that code aware of the new IsRed column.

Related Solutions

Sql-server – What can cause slow queries with low resource usage

This is most likely caused by the MERGE operation in SQL Server being single threaded. No matter how many resources you throw at it, it wont run faster which is exactly what you are observing.

The solution is to manually parallelise the query by running multiple copies of the MERGE statement at the same time, each operating on their own subset of the data.

The problem is very common in data warehouses running SQL Server. There are a series of good design pattern in this document: http://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx (DISCLAIMER: This is written by yours truly, so I am shamelessly plugging this)

SQL Server – Designing Database with Only Text Fields

Size is one consideration. An int can hold up to -2,147,483,648 in four bytes. A char will need 11 bytes to hold the same value.

There are built-in functions to manipulate the various data types. DATEADD() and DATEDIFF() are two examples. This will not be possible with date-stored-as-text. Constantly CASTing back and forth will not make for efficient processing, or legible code.

Automatic validation is another foregone benefit with the all-text approach. You may think that a column contains dates but there will be nothing to stop someone entering the value '2014-13-97'.

Sorting is unlikely to give the intended result with columns which are "really" numbers. For example, if a column contained integers '1' through '100', and the query sorted by this column, one would expect the result to be

1, 2 .. 10, 11 .. 20, 21 .. 100

However, the actual result is more likely to be

1, 10, 100, 11, 12, 2, 20, 21 ...

There may be similar concerns for dates, depending on the chosen character representation. Again type casting can cure this with the costs mentioned previously.

Occasionally you will find strings which contain digits only. Examples are national identity numbers, bank account numbers, phone numbers and such like. The common characteristic of such values is that it does not make sense to perform calculations on them. Other things being equal it may be OK to store these as char() columns, especially if they have optional embedded alpha characters, but the above considerations still apply.

Best Answer

Related Solutions

Sql-server – What can cause slow queries with low resource usage

SQL Server – Designing Database with Only Text Fields

Related Question