My purpose is to monitor which tags increase/decrease rapidly last week on different bulletin boards. Or which tag do new appear. The current table schema of articles is as follows:
article_id e.g. 123
tags e.g. apple,orange
article_created_at e.g. 2021/5/7 01:23
borad_name e.g. database
author e.g. some_person
My current idea is to calculate the amount of each tag by board weekly. But I want to survey other solutions to reduce the development time. Is it ok to choose a time-series database engine or other possible solution? Thanks for your time.
Best Answer
Having a comma-separated list of tags would save on the number of rows written to a database, but would make querying far more computationally expensive. Typically this sort of table would have one record per tag, as this would allow for simpler queries to be written. A structure similar to something you'd find in a blogging database would work. For example:
Then you would have records like this
article_id
tag
flat
posted_at
board_name
author
The
flat
field would contain a lower-casetag
value with all accents and other non-alphanumeric components removed so that values likepokémon
,PokEmon
, andPokemon
do not appear as three separate values.With this sort of design, you could then begin writing queries that could rank tag usage over a period of time. If you were interested in the five most popular tags from last week, you could write a query like this:
Alternatively you could do it by board, or by author, or across a period of weeks or months. The sky is the limit and the queries are simpler to write. If there is a regular means of presentation, this data could be collated every week and written to a summary table that would save you the hassle of looking up and calculating tag usage over time.
For example:
Then if you wanted to present a chart showing the popularity of the
pokemon
tag for all of 2020 (similar to Google's Ngram Viewer), you could write a query like this:This data could then be used to populate a simple line chart:
Hmm ... who knew that Pokemon was a thing before WWI? I sure as heck didn't.
This isn't the only way to accomplish your goal, of course. Experience has shown that it is one of the simpler ways, though, particularly when working with data that is supplied by people.