Postgresql – Find most frequent values for a given column

aggregatedatabase-designdatepostgresql

I have a table that I would like as a leader-board for invitations as described below. I would like to create a query that counts the number of duplicate rows in a given month and order in a descending fashion.

Reading through some questions, this query seems to work:

SELECT COUNT(invite_code) AS counted
FROM invite_table
GROUP BY invite_code
ORDER BY counted DESC
LIMIT 10;

But it doesn't consider the month. What I am looking for is to get the most frequently appearing user_code where the month is specified. Also any criticism about the table design is welcome as I have deliberately designed it such that there are repeating rows with duplicate values. I am trying to track users whose invite code is used the most in a given month, I also have codes that indicate which channel a user comes from (maybe seeing an ad in FB for example), is this a valid table design?

Table "public.invite_table"
   Column    |         Type         | Collation | Nullable | Default
-------------+----------------------+-----------+----------+---------
 user_code   | character varying    |           | not null |
 invite_code | character varying    |           |          |
 month       | character varying(3) |           | not null |
 points      | integer              |           | not null |
Indexes:
    "invite_table_pkey" PRIMARY KEY, btree (user_code)
Foreign-key constraints:
    "invite_table_user_code_fkey" FOREIGN KEY (user_code) REFERENCES user_table(user_code)

Best Answer

get the most frequently appearing user_code where the month is specified

Since user_code is the primary key, that question would be nonsense. There can never be more than one. I assume you meant invite_code?

Just add a WHERE clause. And since the column can be NULL, also consider excluding NULL values:

SELECT invite_code, COUNT(*) AS counted
FROM   invite_table
WHERE  month = 'May'  -- or whatever is stored in your varchar(3) column
AND    invite_code IS NOT NULL -- exclude NULL
GROUP  BY invite_code
ORDER  BY counted DESC, invite_code  -- to break ties in deterministic fashion
LIMIT  10;

Month, date, timestamp?

A month column as varchar(3) doesn't seem very useful if there can be data for more than a single year. I would use data type date for it. You can format that with to_char() any way you like for presentation. Like:

SELECT to_char(date '2017-12-01', 'Mon');  -- 'Dec'

The column could look like this (also addressing your comment):

...
, inserted_at date DEFAULT CURRENT_DATE
...

The default value is entered when the column is omitted in an INSERT statement.

Or, if really only the month is relevant:

... DEFAULT date_trunc('month', now())::date

Or store the complete timestamptz (8 bytes, that's what I would probably do):

...
, inserted_at timestamptz DEFAULT now()
...

Read the manual here and here.

And be aware that date and timestamp depend on your current time zone setting. Details: