NOTE: The application is not really about websites or keywords, I am using that here only to provide reference points for discussion.
I have a data set that is really just a string of words. Think of at "SEO" keywords
for a website. I have a table that contains data that relates to this data. For example, say a list of websites
.
The key scenario for this data set is retrieval against the master data, in two-way fashion. That is, while I can get a website with all its associated keywords, I should also be able to input a keyword and get a list of websites attached to that keyword. There is no other metadata attached (or attachable) to the keyword itself.
Now, given that query performance while retrieving the website -> keyword
and keyword -> website
(significantly more use cases) is paramount, which of these design scenarios makes more sense?
-
In the
websites
table, I have a single nvarchar column that contains a string of all the keywords for that website, possibly comma-separated. Retrieval in this case would be using a LIKE operator on that row. -
I create a separate
keywords
table, with two columns (Id
,Keyword
), put all the keywords in there, and then have a third tableWebsiteKeywords
that contain the mappings between thewebsites
andkeywords
tables. To retrieve, we do a three-way join betweenwebsites
,websitekeywords
andkeywords
tables.
The retrieval is designed to happen via web service infrastructure, so there will be multiple look ups from the middle-tier layer before it thinks it has all the data. So a single "search" fired as a call will result in multiple such look ups, all results will be aggregated by that middle tier before being returned to the caller.
What are your suggestions?
Best Answer
Definitely you want to use option 2. Not only will your queries be faster (= is always faster than like) but you can also index on that keyword field for even faster queries AND your storage space will be significantly reduced since you are not storing the long keyword strings for each website.