How to model this problem in a graph database

database-designgraphgraph-dbmsneo4j

I have a project that I'm working on where I extract data from PDFs and map/visualize the relationships between the extracted pieces.

Here's an example of my problem:

file: 11425646.pdf
  author: bob
  company: abc co
  date: 1/1/2011
  mentioned_users: [alice,sue,mike,sally]
  images: [1958.jpg,535.jpg,35735.jpg]

file: 15421484.pdf
  author: betty
  company: ionga
  date: 2/15/2011
  mentioned_users: [john,alex,george]
  images: [819.jpg,9841.jpg,78.jpg]

file: 11975748.pdf
  author: micah
  company: zoobi
  date: 9/26/2011
  mentioned_users: [alice,chris,joe]
  images: [526.jpg,5835.jpg,355.jpg]

How can I model this in a graph database like Neo4j?

I would like to be able to be given one piece of data (like a person's name) and find all related (images, co-mentions, authors, etc.) at up to 10 depth. Here's what I'm thinking for the structure, but I'm not sure if it's a good approach: (this isn't any kind of actual syntax)

[file: 11425646.pdf date:1/1/2011] -written_by-> bob
[file: 11425646.pdf date:1/1/2011] -from_company-> abc co
[file: 11425646.pdf date:1/1/2011] -mentions-> alice
[file: 11425646.pdf date:1/1/2011] -mentions-> sue
[file: 11425646.pdf date:1/1/2011] -mentions-> mike
[file: 11425646.pdf date:1/1/2011] -mentions-> sally
[file: 11425646.pdf date:1/1/2011] -has_image-> 1958.jpg
[file: 11425646.pdf date:1/1/2011] -has_image-> 535.jpg
[file: 11425646.pdf date:1/1/2011] -has_image-> 35735.jpg

Is this the right way to structure this data in a graph database?

Best Answer

How can I model this in a graph database like Neo4j?

Modeling neo4j graphs from relational data is quite simple:

  1. Decide your vertexes (nodes, objects) and edges (relationships).
  2. Convert relational data to cypher, declaring all items and all relationships explicit.

Note: Mapping from relational to graph could take only selected entities from relational model, and single table rows can explode into multiple vertexes and multiple edges.

Is this the right way to structure this data in a graph database?

Yes, it looks OK. Assuming that file, author, company, user, and image are nodes, and date is only an attribute, this

file: 11425646.pdf
  author: bob
  company: abc co
  date: 1/1/2011
  mentioned_users: [alice,sue,mike,sally]
  images: [1958.jpg,535.jpg,35735.jpg]

should convert to this

MERGE (f :File {name:'11425646.pdf', date:'1/1/2011'})
MERGE (a :Author {name:'bob'})
MERGE (c :Company {name:'abc co'})
MERGE (u1 :User {name:'alice'})
MERGE (u2 :User {name:'sue'})
MERGE (u3 :User {name:'mike'})
MERGE (u4 :User {name:'sally'})
MERGE (i1 :Image {name:'1958.jpg'})
MERGE (i2 :Image {name:'535.jpg'})
MERGE (i3 :Image {name:'35735.jpg'})
MERGE (f)-[:WRITTEN_BY]->(a)
MERGE (f)-[:FROM_COMPANY]->(c)
MERGE (f)-[:MENTIONS]->(u1)
MERGE (f)-[:MENTIONS]->(u2)
MERGE (f)-[:MENTIONS]->(u3)
MERGE (f)-[:MENTIONS]->(u4)
MERGE (f)-[:HAS_IMAGE]->(i1)
MERGE (f)-[:HAS_IMAGE]->(i2)
MERGE (f)-[:HAS_IMAGE]->(i3)

Useful links: data modeling guide and Cypher reference