How to design a schema to store search keywords

database-design

I am doing a database design that stores search keywords of a search form, used to generate the most popular search keyword reports by date, month and year. I have made a simple schema:

CREATE TABLE SearchTerm (
 ID             INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 SearchTerm     VARCHAR( 255 )  NOT NULL,
 Date           DATE            NOT NULL
);

In this case, search term will be redundant in the table and make the database grow faster. I want to avoid data redundancy.

Also I am not sure if using the date data type or unix timestamp will make a difference in retrieving the data faster.

So, should I create a new table to store a date that links to master table by id? For example,

CREATE TABLE SearchTerm (
 ID             INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 SearchTerm     VARCHAR( 255 )  NOT NULL,
 Hit           INTEGER         NOT NULL,
 DateID         INTEGER         NOT NULL
);

CREATE TABLE DateofSearchTerm (
 DateID         INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 Date           DATE            NOT NULL
);

My concern is performance as the database grows. If the database is not designed properly , it would be leading to performance and bottleneck issues.

One of my friends has done an analytic site, but it is not going well. I figured out that there are 3 million records in the table and his query is taking more than 3 minutes to return results.

And only one table he created, then save “Event Name”, “Start data”,” End date” something like that. When generating monthly report, it could not show the report due to timeout issues (server time out, query time out, jquery response time out).

His SQL query is too expensive. He use group by date in sp to get how many users daily and monthly. Then passes the result to populate flash based report. I fixed it to see report by increasing command time out. But I don’t think it is a good idea. Data is continuing to grow, it would be 4 mill records in end of this year. At the end of the day , increasing command time out will not work. This issue is a result of weak database design. I don’t want to follow it.

Please advise me and share your experience to avoid these issues. How should I design search term db schema?

Best Answer

First, let me address your concern about data redundancy. I agree with you that the second schema is more likely to reduce the redundancy, and is probably closer to what I would go with. One thing to be aware of, though, is this will be unique search terms. So unless you have some way of normalizing data, misspellings will get through and be viewed as different search terms.

Also, I would switch the way you link the searchTerm to the date searched, so that dateofSearchTerm links to the search term (not the other way around).

CREATE TABLE SearchTerm (
 ID             INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 SearchTerm     VARCHAR( 255 )  NOT NULL,
 Hit           INTEGER         NOT NULL,
);


CREATE TABLE DateofSearchTerm (
 DateID         INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 searchID       INTEGER         NOT NULL,
 Date           DATE            NOT NULL
);

It would also be a good idea to make searchID a foreign key that references searchTerm.ID to ensure there are no orphan rows (rows in DateofSearchTerm that do not have a matching parent in SearchTerm).

As for your anecdotal story about your friend, the biggest thing I see you missing in your designs are indexes. A table 3-million strong should have no problem running quickly with proper indexes.

Using your design, you would want an index on the searchTerm.searchTerm column, and on DateofSearchTerm.date.

I think if you follow the advice to make searchID a foreign key, that will also become a proper index. I don't use SQL Server, so I don't know if foreign keys require an index, so make sure that it does.