How should I model a binary-tree like data using Adjacency List

database-designperformance

I'm making a system where a user can recruit others, resulting in a binary tree model of users. I have decided to use Adjacency List to model the data but I have some doubts about the number of fields in a table.

I made the users table, there are 24 fields including the node fields of a binary tree:

parent_id, side(left child or right child of its parent), indicated_id(the guy who recruited the user may be different of its parent in the binary tree) + 21 fields(email, name, password, tokens, phone, mobile, etc)

So my doubts are:

1)Should I make a "nodes" table to wrap the user attributes that are not related to the binary-tree like this:

parent_id, indicated_id, side, user_id(pointing to the users table)

Or put everything in the users table? I think if I put everything in the users table I will avoid JOINS and the performance would be better. But if I wrap in a nodes table I will be able to query for a sub-tree without read unrelated fields and Join only the resulting query, so the perfomance lost would be negligible?

Besides that, is there any problem to have a table with 20-30 fields that justify the use of another table for a 1-1 relationship?

2) Is there any better way to model an Adjacency List?

Best Answer

If you set up an index over the binary-tree related fields, leaving the fields in the table should have more or less the same performance as if you had them into their own table with a full covering index (as PostgreSQL supports index-only scans as of v9.2). It probably isn't a bad idea to set up some tables with filler data and do some test cases, though.

In regards to 2), there is a slightly different way you can represent this kind of data, and it really depends on the way you expect to be querying it. This might not be useful, but might give you some food for thought:

For my organization I had to come up with a way to represent organization structure in such a way that it facilitated very fast queries of the kind "give me every person who reports up to X but has direct reports", or "give me the list of persons who are within Z reporting levels to this person". The solution is a slightly modified adjacency table of the form:

 h_ID, emp_ID, m_ID, lvlsAbv  

where h_ID is an autogenerated key, emp_ID is the employeeID, m_ID is the managerID, and lvlsAbove is the # of reporting lvls difference between the 2 people. This means that each employee has multiple rows (1 for each manager above them).

Example:

h_ID    emp_ID    m_ID   lvlsAbv 
42530   211432  254192  1
42531   211432  197829  2
42532   211432  256373  3
42533   211432  255628  4
42534   211432  256978  5
42535   211432  3735    6

The result is a slightly larger table, but is still small enough (size wise) to easily justify a covering index over the whole thing.

The advantage of this kind of structure is the ability to write very simple queries against relational properties of the tree (ex: "select everybody that is downtree of person X"). The downside is that it requires more work to construct and maintain (a lot more).