Postgresql – Multi column index on nested data

database-designperformancepostgresqlpostgresql-performance

I have a table that has a field user and a field group (among others). Each user is in exactly one group, not more, not less. Whenever I query for a user, I also know his group. Would it be faster to have a multi-column index and query for both, user and group, instead of just using a single-column index and querying only for the user? If yes, what index type would I choose? My idea was, that searching for the group first and then searching for the user only in those results would be fastest, but I'm not sure if any of the index types can actually do that.

I also thought about separating users and groups by putting them in different tables but as both numbers can grow indefinitely, I couldn't think of a way to do that without having an indefinite number of tables.

Best Answer

I'm mostly familiar with MySQL and MS SQL, so I can't speak to the performance of PostgreSQL on multi-column indexes. However, my guess is that as long as the number of users is small (< 100,000), a single-column index on user would work well.

Regarding the second portion of your question:

I also thought about separating users and groups by putting them in different tables but as both numbers can grow indefinitely, I couldn't think of a way to do that without having an indefinite number of tables.

It sounds like the user and group values are repeated in your table. If this is true, the usual thing to do would be to 'normalize' the data. In this case, you could move the relationship between users and groups to a different table.

You haven't specified whether the user or the group is the most tied to the row in your current table, though I'll guess it's probably the user.

A table layout with users, groups, and data would usually look like:

data_table
data_id    data_col_1    data_col_2    user_id
1          1.2374        pancakes            2
2          8.2978        strawberries        1
3          78.001        hashbrowns          1
...

user_table
user_id    user_name     group_id
1          Travis        1
2          Melanie       1
3          John          3
...

group_table
group_id   group_name
1          sales
2          marketing
3          engineering
...

This assumes the property that you mentioned, where one user is a member of only one group. If that property changed to allow users to join more than one group, you would delete the group_id column in the users table and create a new 'mapping' table with two columns: user_id and group_id

If your data is organized this way, then user names and group membership can change without modifying all matching rows in the data table.

Here is a more in-depth explanation of DB normalization: https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/

Related Solutions

Mysql – is this database structure correct

For the Users, Groups and userGroupLink tables you have the relationships correct.

I would suggest you change groups.user_id to owner_user_id just to make it a little bit clearer, but that's a minor point of preference. Since userGroupLink has no attributes of its own nor is it the parent in any relationships you could remove the ID column from this table without any loss of meaning or utility. Indeed you will get a small performance boost by storing more rows per page.

To find all the groups of which a user is a member you would join the tables together:

select
    u.id
    ,u.username
    ,g.group_name
from mydb.users as u
inner join  mydb.userGroupLink as l
    on l.user_id = u.id
inner join mydb.groups as g
    on g.id = l.group_id
where u.id = <the user id you are looking for>

If you are looking for the groups which a user owns that would follow the other relationship.

Mysql – Multi-Column Full Text Search Going Very Slow

PROBLEM

From the posts in your question, I see 3 FULLTEXT indexes. There is one for each column.

Why did the query work at all ? MySQL worked with whatever it had. In your case, it searched by a full table scan. That's what the MySQL Query optimizer decided on.

SOLUTION

What you really need is a single FULLTEXT index with all 3 columns

ALTER TABLE articles ADD FULLTEXT content_title_keywords_ndx (content,title,keywords);

Only then can you say

match(content,title,keywords) against ('cats' in boolean mode)

I have suggested making compound FULLTEXT indexes before

Mar 16, 2012 : Speed up search across multiple columns
Oct 13, 2012 : Can underscore be forced as a word splitter without a full-text parser plugin?
All my posts about FULLTEXT indexing and searching