Postgresql – Hierarchical structure with limited number of records per user

database-designhierarchypostgresql

I am planning schema for an affiliate network. For all hierarchical queries I am using Postgres tablefunc extension but here is another problem.
Any user can refer only 3 other users at max.
So for example I have this relation of users:

CREATE TABLE users
(
  id serial NOT NULL,
  referred_by integer NOT NULL,
  created_at time without time zone NOT NULL,
  updated_at time without time zone NOT NULL,
  CONSTRAINT primary_key PRIMARY KEY (id),
  CONSTRAINT referred_foreign FOREIGN KEY (referred_by)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)

I can use connectby function to query any hierarchy. But when it comes to inserting related models I have to check if a user has not already referred 3 other users before inserting any new user under him. If he already have 3 users then it need to cascade down the tree and put it as a leaf node of tree which is further a heavy query to run on larger hierarchies.

For cascading new users to leaf node Is it possible to determine the nearest node where new user can be put ? for example If this is the data in table:

                    ____________ A ___________
                   /             |            \
                __ B __       __ C __     ____ D ____
               /   |   \     /   |   \   /     |     \
              E    F    G    H   I   J   K     L

And A refers one more user, Now, since A's downline is fully saturated any new user under him should cascade down the tree, and there are 9 places where the new user can be put.

E    F    G    H    I    J    K    L    D

out of which the nearest node is D, so incoming node should go under D instead of E,F..K,L

I have 3 questions:

  1. Most important, Is it possible to restrict the number of records under each user ? I dont want to rely on triggers to check for number of records and cascade down if required. I can change the design if necessary.
  2. How can I 'efficiently' fetch the nearest leaf node of any tree after determining that the root node is saturated ? By efficiently I mean that the same query might need to run a couple hundred times per minute and the data set has no depth limit. Again, I am open for design suggestions if any.
  3. Is it possible to create partial index for referred_foreign so that the first user who will not be referred by anyone can still satisfy the foreign integrity ?

Is there any better design or something that I am missing ? (I know I am missing a lot just dont know what it is)

Best Answer

The following requirement can be easily implemented with a constraint: any user can refer only 3 other users at max.

    CREATE TABLE users
    (
      id serial NOT NULL,
      reference_number SMALLINT NOT NULL,
      CONSTRAINT CHK_reference_number CHECK(reference_number BETWEEN 1 AND 3),
      CONSTRAINT UNQ_reference_number UNIQUE(id, reference_number),
      referred_by integer NOT NULL,
(snip)

Note that your design does not prevent cycles. We can easily enforce that via constraints as well. I can elaborate if you are interested.

Regarding the efficiency of finding qualifying descendants, we can add some redundant data and get much better speed. For example, E is a descendant of A, but not a direct one - B is between them. We can store the following rows:

Ancestor: A
AncestorLevel: 1
Descendant: B
DescendantLevel: 2

Ancestor: B
AncestorLevel: 2
Descendant: E
DescendantLevel: 3

Ancestor: A
AncestorLevel: 1
Descendant: E
DescendantLevel: 3

Once we have this redundant data, finding descendants is easy and fast - just one simple query without recursion. Naturally, with this approach we need substantially more storage.

Of course, with redundant data there is always the risk that it is inconsistent. We can use constraints to enforce the integrity of redundant data. This is complex but doable.