First off, your PRIMARY KEY
spanning two varchar(2000)
columns seems extremely expensive. If you use your PK for anything else I suggest a surrogate PK (use a serial
column) and add a UNIQUE
constraint to enforce uniqueness on (cid, name, synonym)
.
If one of your varchar
columns actually uses the maximum length you would exceed the maximum size for an index entry. See:
I guess what you want is this, because it would make sense:
SELECT DISTINCT ON (c.name)
c.name, min(s.synonym) AS min_synonym, s.cid
, ts_rank(s.tsv_syns, c.lexemes, 16) AS rnk
, count(*) AS ct
FROM synonyms_all_gin_tsvcolumn s
JOIN cmap5 c ON c.lexemes @@ s.tsv_syns
GROUP BY c.name, rnk, s.cid
ORDER BY c.name, rnk DESC, ct DESC;
I use explicit [INNER] JOIN
with attached join condition replacing your CROSS JOIN
plus WHERE
clause. It's generally considered superior (easier to read and debug). I also use rnk
as column name to avoid the basic function name rank
as identifier.
Group the results per c.name
that have the same rnk
and s.cid
, take min(s.synonym)
(for lack of definition in the question), and count(*)
the peers per group.
Reduce to one row per c.name
with DISTINCT ON
(Postgres specific extension of SQL standard DISTINCT
), taking the highest rank first and, within same rank, the highest peer count. See:
Select first row in each GROUP BY group?
Combining GROUP BY
and DISTINCT ON
this way in one query level is possible since DISTINCT
or DISTINCT ON
are applied after GROUP BY
.
The simplest structure would be a PERMISSIONS
table like this:
create table PERMISSIONS
( Site_ID int
, User_ID int
, Role char(1)
, primary key (Site_ID, User_ID, Role) -- scenario 1
-- OR:
, primary key (Site_ID, User_ID) -- scenario 2
)
Use values for Role
such as:
- A = Administrator
- P = Publisher
- E = Editor
How many rows you need depends on how your code interprets the data. You could have very simple code which interprets each permission strictly at face value. To do an administrative task, the user would need an 'A' permission. To do a publishing task, they would need a 'P' permission, etc. In this case an administrator would need three records per site (one each of A, P, E). Similarly publishers would need two records per site and editors only one. There is a risk in this approach that data consistency errors could creep in, like someone could be a publisher, but not an editor (by virtue of the 'E' record being missing).
Alternatively, you could make your code more complex, in which case each user only needs one permission record per site. In this scenario, administrative tasks require an 'A' permission, publishing requires an 'A' or a 'P' and editing requires 'A', 'P', or 'E'. The advantage of this approach is that there is less data to maintain and you won't have inconsistencies.
Note too that you do not need to have the Users.Parent
column. If an administrator has an 'A' permission for a site, that automatically gives them control of all of the users for that site.
Best Answer
This should come close to meeting your need.
This assumes you want the
rel_user_id
from the most recently added row for a givenuser_id
andparent_id
, and that the row with the highestid
should be the one most recently entered. If you want the rel_user_id with the highest value instead, .you can basically just use theagg
subquery (changingMAX(ID)
toMAX(rel_user_id)
.Also, I didn't worrying about the join to the
posts
table - I'm guessing thatparent_id
is the foreign key column, and you've got that; you should be able to get there from here.Here's the query:
The
agg
subquery gets us one row for eachuser_id
andparent_id
; we'll put theWHERE
clause specifying whichuser_id
you want here, to limit how many records we're grouping. This gets us the count of rows for theuser_id
andparent_id
combo, and gets use the "most recent"id
.We then tie back to
notifications
with anINNER JOIN
to get therel_user_id
from the indicatedid
row.Here's a db-fiddle.com link showing the query in action, with the sample data you provided.