Designing for maximum performance for an anticipated large user-base, I would suggest ensuring bookmarks and tags are not duplicated unnecessarily. Using my suggested schema below, http://google.com/
would become a single bookmark row. Users who want a bookmark to Google would have a row in UserBookmarksTags
referencing the Bookmark row and the associated tag row.
I would tend to abstract the UserID away from the Bookmarks table, and use a cross-reference table to determine which users have bookmarks, and what they've tagged each bookmark with.
Using SQL Server, I would do it like this:
USE tempdb;
GO
CREATE TABLE Users
(
UserID INT NOT NULL
CONSTRAINT PK_Users
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, UserName varchar(255)
);
CREATE TABLE Bookmarks
(
BookmarkID INT NOT NULL
CONSTRAINT PK_Bookmarks
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, BookmarkName varchar(255)
, BookmarkURL varchar(255)
);
CREATE TABLE Tags
(
TagID INT NOT NULL
CONSTRAINT PK_Tags
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, TagName varchar(255)
);
If every entry in your system requires both a bookmark and a tag, you could use the following table for maximum performance:
CREATE TABLE UsersBookmarksTags
(
UserID INT NOT NULL
CONSTRAINT FK_UsersBookmarksTags_UserID
FOREIGN KEY REFERENCES Users(UserID)
, TagID INT NOT NULL
CONSTRAINT FK_UsersBookmarksTags_TagID
FOREIGN KEY REFERENCES Tags(TagID)
, BookmarkID INT NOT NULL
CONSTRAINT FK_UsersBooksmarksTags_BookmarkID
FOREIGN KEY REFERENCES Bookmarks(BookmarkID)
, CONSTRAINT PK_UsersBookmarksTags
PRIMARY KEY CLUSTERED
(UserID, TagID, BookmarkID)
);
The above version of the table UsersBookmarksTags
allows you to quickly return a list of bookmarks for a specific user that matches any given tag, or all bookmarks for that user.
Alternately, you could create the cross reference table that allows null entries for both Tags and Bookmarks, as in:
CREATE TABLE UsersBookmarksTags
(
UsersTagsBookmarksID INT NOT NULL
CONSTRAINT PK_UsersBookmarksTags
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, UserID INT NOT NULL
CONSTRAINT FK_UserBookmarks_UserID
FOREIGN KEY REFERENCES Users(UserID)
, TagID INT NULL
CONSTRAINT FK_UserBookmarks_TagID
FOREIGN KEY REFERENCES Tags(TagID)
, BookmarkID INT NULL
CONSTRAINT FK_UserBooksmarks_BookmarkID
FOREIGN KEY REFERENCES Bookmarks(BookmarkID)
);
This allows rows that have a bookmark without a tag, and also allows tags to be associated to a user without a bookmark.
If there is a requirement to know all the tags a user has added, without those tags necessarily having an associated bookmark, I would consider adding another table for that purpose, as:
CREATE TABLE UserTags
(
UserID INT NOT NULL
CONSTRAINT FK_UserTags_UserID
FOREIGN KEY REFERENCES Users(UserID)
, TagID INT NOT NULL
CONSTRAINT FK_UserTags_TagID
FOREIGN KEY REFERENCES Tags(TagID)
, CONSTRAINT PK_UserTags
PRIMARY KEY CLUSTERED
(UserID, TagID)
);
If you have 100-150 columns, it is likely some of those are repeating groups. Those should be normalized into their own child tables (PK: user_id, sqn). Review your design. You should normalize to at least 3rd normal form.
Joins are usually very efficient, and in many cases only rows required for the result are joined. With that many columns, it is unlikely you would ever have to return all the columns, so you would likely be joining a subset of the tables.
You should be concerned with the cost of searching for matches on columns that aren't indexed. The higher the column number, the more expensive it is to find the column to compare it. This cost is likely to far outweigh the cost of joining.
The main reasons I use a one-to-one tables are:
- Separate large tombstone data from volatile information.
- Separate data with higher security requirements, from more public data.
- Store sub-type data (one to zero-or-one).
Best Answer
Basically you can only use
With the bridge table, if one tags is set for many questions.
It wuld also save some space, if you have a lot of questions.
The sencond option has the problem, that if you wqant all tags t be displayed you need five INNER JOINs, for every Tag one, while you can make the same query with two joins, with the first option..
Overall, the first option with bridge table, is the more flexible and space consuming option.