SQL Server Performance – Store in Normalized Table with JOIN or Single nvarchar Column with LIKE?

join;likeperformancequery-performancesql server

NOTE: The application is not really about websites or keywords, I am using that here only to provide reference points for discussion.

I have a data set that is really just a string of words. Think of at "SEO" keywords for a website. I have a table that contains data that relates to this data. For example, say a list of websites.

The key scenario for this data set is retrieval against the master data, in two-way fashion. That is, while I can get a website with all its associated keywords, I should also be able to input a keyword and get a list of websites attached to that keyword. There is no other metadata attached (or attachable) to the keyword itself.

Now, given that query performance while retrieving the website -> keyword and keyword -> website (significantly more use cases) is paramount, which of these design scenarios makes more sense?

  1. In the websites table, I have a single nvarchar column that contains a string of all the keywords for that website, possibly comma-separated. Retrieval in this case would be using a LIKE operator on that row.

  2. I create a separate keywords table, with two columns (Id, Keyword), put all the keywords in there, and then have a third table WebsiteKeywords that contain the mappings between the websites and keywords tables. To retrieve, we do a three-way join between websites, websitekeywords and keywords tables.

The retrieval is designed to happen via web service infrastructure, so there will be multiple look ups from the middle-tier layer before it thinks it has all the data. So a single "search" fired as a call will result in multiple such look ups, all results will be aggregated by that middle tier before being returned to the caller.

What are your suggestions?

Best Answer

Definitely you want to use option 2. Not only will your queries be faster (= is always faster than like) but you can also index on that keyword field for even faster queries AND your storage space will be significantly reduced since you are not storing the long keyword strings for each website.