Postgresql – Database modelling : Creating a model for restaurant application which gives statistics

data-warehousedatabase-designpostgresqlstatistics

I am working on a server-side application in which users will be able to give ratings and comments to different factors of a 'Restaurant'. What the webapp does is takes all the ratings and creates statistics out of it. The statistics are based upon day, week, month and year. As well as other things like gender ratio, age ratio, etc.

I am free to design an optimal DB model for this, and I have done some of it, but would like to ask if this is optimal, and if not, how can optimize it so its easier to maintain, understand, error-free.

The model I have right now for statistics seems a bit complicated to calculate, maintain. I am looking for some easier options.

Here is the information the visitor will give us :

CREATE TABLE rating
(
  rating_id integer NOT NULL, // primary key
  age_range integer DEFAULT 0,
  comment_text character varying,
  friendly_stars numeric DEFAULT 0,
  gender character varying,
  other_stars numeric DEFAULT 0,
  overall_feedback boolean DEFAULT false,
  relative boolean DEFAULT false,
  quality_stars numeric DEFAULT 0,
  save_time timestamp without time zone
)

Here is the Statistics and monthly statistics I have. Monthly statistics will actually save daily computed statistics with month code, so it's easier to isolate months also from that was something I thought :

CREATE TABLE monthly_statistic
(
  mstat_id integer NOT NULL, // Primary key
  daily_value timestamp without time zone,
  female_ratio character varying,
  friendly_quality numeric DEFAULT 0,
  high_age_ratio character varying,
  less_age_ratio character varying,
  man_friendly numeric DEFAULT 0,
  man_ratio character varying,
  mid_age_ratio character varying,
  other_quality numeric DEFAULT 0,
  service_quality numeric DEFAULT 0,
  woman_friendly numeric DEFAULT 0,
  statistics_id integer NOT NULL, // assocication with Statistics
)

And finally, here is Statistics :

CREATE TABLE statistics
(
  statistics_id integer NOT NULL,
  less_age_month character varying, // Less age means 18-24
  less_age_week character varying,
  less_age_year character varying,
  mid_age_month character varying, // Mid_age means 24-60
  mid_age_year character varying,
  mid_age_week character varying,

 more_age_month character varying, // More age means 60+
  more_age_week character varying,
  moreageyear character varying,
  month_friendly numeric DEFAULT 0,
  month_man character varying,
  month_other numeric DEFAULT 0,
  month_quality numeric DEFAULT 0,
  month_woman character varying,

  week_friendly numeric DEFAULT 0,
  week_man character varying,
  week_other numeric DEFAULT 0,
  week_quality numeric DEFAULT 0,
  week_woman character varying,
  year_friendly numeric DEFAULT 0,
  year_man character varying,
  year_other numeric DEFAULT 0,
  year_quality numeric DEFAULT 0,
  year_woman character varying,
  restaurant_branch_id integer NOT NULL, // Association with resto branch
)

So as you can see I have created values for week, month and year. But if the user wishes to see last week or this weeks statistics, then I need something more fine-grained, that's why I created monthly-statistics, with which I can save information based on per day.

I have to also show information like, within 1 week or 1 month or 1 year, how many of them were males or females, how many were with age 18-24, etc. I hope you guys get the idea. Kindly let me know if this data-model sounds suitable or if you guys have any idea for a better one.

Please note, that statistics are calculated only in night, not real-time and I am using PostgreSQL.. Have a good day.

Best Answer

Save all of the raw ratings (i.e. individual user inputs) with the date on which the rating was recorded.

Then you can calculate all of your weekly, monthly and yearly statistics based on the raw data each night. These rolled up statistics should also be dated (with a date range). In this way you can also keep multiple weeks, months and years so that you can look at trends if you like.

This is often called a data warehouse. It is a common way of pre-calculating rolled-up statistics for quick and easy reporting.

When you get past the longest of your roll-ups (e.g. 1 year) you can start to throw away your raw transactional data if you're worried about how much space it takes up.