Postgresql – ListServ Database: Statistics Table Design Question

database-designpostgresql

I've got a little bit of a design question going on here. I have my own ListServ implementation written in Python, that works with a Postfix+Dovecot system for handling mail, and a PostgreSQL database backend for most of the functionality for determining the following:

  • What lists exist on the listserv (the lists table in the diagram below)
  • Who has ever been seen on the listserv as a member of any list. (the members table in the diagram below)
  • Who is a current member of which lists, and: (the listserv_membership table in the diagram below, linking individual 'members' to the individual 'list' they are a member of by listserv ID and member ID)
    • Can they send to the list? (hidden field in the table)
    • Can they receive messages sent to the list? (hidden field in the table)
  • ListServ statistics – Message counts for each day, based off the current date's timestamp, for each individual list that has had activity. (the listserv_stats table below)

Currently, the structure of the DB looks like this (Note that I have blanked out all the non-relevant fields in tables I'm not going to be focusing on, and leaving the primary keys in place; the relevant table is marked with three white asterisks next to it, and this diagram is created with DataGrip):

Current ERD
Please don't call me out about diagonal arrows in the diagram – this one was done as a quick and dirty diagram, while I have a good ERD done by hand in Visio, which is unfortunately not available on the computer I'm on now

The specific table I'm going to ask about is the listserv_stats table here, not the rest of the design of the database.

Now, initially, I was only worried about tracking statistics over a day-to-day basis and 'all time' very rough daily average based on how many days the listserv has had activity. I am now a lot more concerned about other statistics, such as monthly averages, month-to-date average, year-to-date average, and yearly averages, so I'm considering a redesign here to better accommodate such searches.

The current table CREATE statement is this:

CREATE TABLE listserv_stats
(
    lsid INTEGER NOT NULL,
    datestamp DATE DEFAULT now() NOT NULL,
    msg_count INTEGER NOT NULL,
    CONSTRAINT listserv_stats_lsid_date_pk PRIMARY KEY (lsid, datestamp),
    CONSTRAINT listserv_stats_lists_lsid_fk FOREIGN KEY (lsid) REFERENCES lists (lsid)
);
COMMENT ON COLUMN listserv_stats.lsid IS 'ListServ ID';
COMMENT ON COLUMN listserv_stats.datestamp IS 'DateStamp';
COMMENT ON COLUMN listserv_stats.msg_count IS 'Message Count';

What I'm thinking of doing is expanding the table to replace the datestamp column as follows. This would allow me to better handle month-to-date, monthly historic, year-to-date, and yearly historic values, in my opinion, in my Python code (which can reconstruct the individual values into datestamps):

CREATE TABLE listserv_stats
(
    lsid INTEGER NOT NULL,
    year INTEGER NOT NULL,
    month INTEGER NOT NULL,
    day INTEGER NOT NULL,
    msg_count INTEGER NOT NULL,
    CONSTRAINT listserv_stats_lsid_date_pk PRIMARY KEY (lsid, year, month, day),
    CONSTRAINT listserv_stats_lists_lsid_fk FOREIGN KEY (lsid) REFERENCES lists (lsid)
);
COMMENT ON COLUMN listserv_stats.lsid IS 'ListServ ID';
COMMENT ON COLUMN listserv_stats.year IS 'Date: Year';
COMMENT ON COLUMN listserv_stats.month IS 'Date: Month';
COMMENT ON COLUMN listserv_stats.day IS 'Date: Day';
COMMENT ON COLUMN listserv_stats.msg_count IS 'Message Count';

From this, I can restructure the existing predefined routines in the DB to allow for my Python code to call "New Date Record" and "Update Date Record" functions by passing the entire date as values.

My question is, is this design change a good idea, or can I achieve everything with the original "datestamp" table I already have in place? And should the original design be kept, I will have to figure out how to structure the routines to provide the data for those specific date ranges from the system (I'll post on SO for that, most likely).

Best Answer

So, in the chat room for the site, I've been told to leave my table structure alone, and to use views or specific queries to get out the data I need from just the timestamp.

Given this, I've decided to leave the table alone, and started looking into the date interaction functions with PostgreSQL (and MSSQL where another group I trust will like to use the functions) and have begun incorporating such specific checks into the predefined routines which I've designed for this system.

The only thing I left out from my question was that my code base in Python only interacts with the database via the predefined routines and functions I've written, so views are less necessary if I can get the database functions configured right to correctly process the specific date ranges on my radar.

Thanks to all who commented in chat about this to me directly, it's helped me work to restructure my initial design for the functions, and do a lot of stuff database side instead of Python-side.