PostgreSQL – Performing Multiple Counts Within a Join

postgresql

I am modeling tokens extracted from tweets and have the below table structure:

CREATE TABLE process (
        id SERIAL PRIMARY KEY,
        ...
);

CREATE TABLE tweet (
        id SERIAL PRIMARY KEY,
        process_id INTEGER NOT NULL REFERENCES process(id) ON DELETE CASCADE ON UPDATE CASCADE,
        cleaned_text TEXT,
        process_tweet_id INTEGER NOT NULL,
        constraint tbl_statusid_processid unique(status_id, process_id)
        ...
);

CREATE TABLE token (
    id SERIAL PRIMARY KEY,
    tweet_id INTEGER NOT NULL REFERENCES tweet(id) ON DELETE CASCADE ON UPDATE CASCADE,
    type TEXT,
    value TEXT,
    ...
);

The token table has a foreign key for the tweet table and the tweet table has a foreign key for the process table.

The 'type' field in the token table can be 1 of 7 or 8 possible values (e.g. 'LOCATION', 'PERSON', etc.). The possible types are static and enforced elsewhere within the application, so there is no chance of any diversion from those set types.

I am trying to create a query that will return all of the tweets referencing a given process_id, the total number of tokens referencing each tweet as well as the number of each TYPE of token referencing each tweet (each type having its own column in the result set).

  id   |     status_id      | tokencount | loctoken | persontoken 
-------+--------------------+------------+----------+-------------
  6158 | 672636476602122241 |          3 |        2 |           1
 10295 | 674183390430552064 |          4 |        2 |           2

The above result set is basically what I am hoping to get back. So, tweet ID# 6158 has a total of 3 tokens, 2 of which are location tokens and 1 is a person token. Tweet ID# 10295 has a total of 4 tokens, 2 locations and 2 persons.

What I have so far is a simple JOIN, but I am unsure how to include the count for each type of token per tweet…

SELECT      tweet.id, tweet.status_id, count(token.id) AS tokenCount    
FROM        tweet     
INNER JOIN  token on token.tweet_id = tweet.id     
WHERE       tweet.process_id = 5     
GROUP BY    tweet.id;

I thought I could simply add count(token.type = 'LOCATION') like the below, but that simply repeated the count(token.id) column.

SELECT      tweet.id, tweet.status_id, count(token.id) AS tokenCount, count(token.type = 'LOCATION) AS locToken, count(token.type = 'PERSON') AS personToken    
FROM        tweet     
INNER JOIN  token on token.tweet_id = tweet.id     
WHERE       tweet.process_id = 5     
GROUP BY    tweet.id;

Can anyone provide the query that would return what I am looking for? Hopefully with a good explanation of the solution?

I would also really like to know why count(token.type = 'LOCATION') did not work and simply returned the same number as count(token.id). I tried it on a whim and really thought I was just going to get an error.

Best Answer

You were nearly there, you can do this with case when ... end

SELECT      tweet.id, tweet.status_id, count(token.id) AS tokenCount, 
sum(case when token.type = 'LOCATION' then 1 else 0 end) AS locToken, 
sum(case when token.type = 'PERSON' then 1 else 0 end) AS personToken    
FROM        tweet     
INNER JOIN  token on token.tweet_id = tweet.id     
WHERE       tweet.process_id = 5     
GROUP BY    tweet.id;