Accumulo table design methodology

database-designnosql

I am just getting started with Accumulo and NoSQL databases and I am looking for some discussion on table design. I get the key value structure that is seen in the manual. However, if I am trying to recreate a relational database, I am not sure how relationships work. Can someone explain to some degree how to setup and "Hello World" database (i.e., manager-employee database). I want to use key-value implementation.

Best Answer

The first thing you should realize is that Accumulo is not a relational database. If you want to make it work like one, you need to create that functionality on top.

One way to do this would be to create rows that look like rows in a relational table, complete with foreign keys that you link together in your application's (an Accumulo client) code. Your client code must enforce the cascading updates/deletes and the other maintenance aspects of relations you may have come to expect from a relational database. Accumulo just stores and sorts Key/Value pairs for you. You decide the semantics of these Key/Value pairs.

Examples:

Note: For illustration purposes, Accumulo Key/Value pairs will be represented as:

RowID:ColumnFamily:ColumnQualifier -> Value

Employee table:

EmployeeID1:Data:Name -> Chuck
EmployeeID1:Data:Position -> Owner

EmployeeID2:Data:Name -> Barbara
EmployeeID2:Data:Position -> StoreManager
EmployeeID2:Data:Manager -> ManagerID1

EmployeeID3:Data:Name -> Tom
EmployeeID3:Data:Position -> SalesManager
EmployeeID3:Data:Manager -> ManagerID2

EmployeeID4:Data:Name -> Sally
EmployeeID4:Data:Manager -> ManagerID3

EmployeeID5:Data:Name -> Rick
EmployeeID5:Data:Manager -> ManagerID3

Manager table:

ManagerID1:Info:EmployeeID -> EmployeeID1
ManagerID1:Employees:1 -> EmployeeID2

ManagerID2:Info:EmployeeID -> EmployeeID2
ManagerID2:Employees:1 -> EmployeeID3

ManagerID3:Info:EmployeeID -> EmployeeID3
ManagerID3:Employees:1 -> EmployeeID4
ManagerID3:Employees:2 -> EmployeeID5

See how I've established foreign keys in both directions? Scanning the Manager table, you can quickly see which employees are managed by whom, but you'll have to look up each EmployeeID in the Employee table to see their information. Scanning the Employee table, you can quickly see information about a particular employee, including the ID of their manager, but you'll have to look up the manager's EmployeeID in the Manager table, and then look that up in the Employee table if you want the name of their manager.


This could be drastically simplified by combining the two tables into a single table and avoiding a separate ManagerID by using EmployeeID for both purposes. Something like:

One Employee table:

...
EmployeeID3:Data:Name -> Tom
EmployeeID3:Data:Position -> SalesManager
EmployeeID3:IsManagedBy:EmployeeID2 -> Since 2001
EmployeeID3:Manages:EmployeeID4 -> Since 2003
EmployeeID3:Manages:EmployeeID5 -> Since 2001
...

However, depending on your application, you may find that in a NoSQL database, you are better off flattening and duplicating your data, rather than normalizing with relations.

Flattened Employee table:

...
Tom:Data:Position -> SalesManager
Tom:IsManagedBy:Barbara -> Since 2001
Tom:Manages:Rick -> Since 2003
Tom:Manages:Sally -> Since 2001
...