How can I model this in a graph database like Neo4j?
Modeling neo4j graphs from relational data is quite simple:
- Decide your vertexes (nodes, objects) and edges (relationships).
- Convert relational data to cypher, declaring all items and all relationships explicit.
Note: Mapping from relational to graph could take only selected entities from relational model, and single table rows can explode into multiple vertexes and multiple edges.
Is this the right way to structure this data in a graph database?
Yes, it looks OK. Assuming that file
, author
, company
, user
, and image
are nodes, and date
is only an attribute, this
file: 11425646.pdf
author: bob
company: abc co
date: 1/1/2011
mentioned_users: [alice,sue,mike,sally]
images: [1958.jpg,535.jpg,35735.jpg]
should convert to this
MERGE (f :File {name:'11425646.pdf', date:'1/1/2011'})
MERGE (a :Author {name:'bob'})
MERGE (c :Company {name:'abc co'})
MERGE (u1 :User {name:'alice'})
MERGE (u2 :User {name:'sue'})
MERGE (u3 :User {name:'mike'})
MERGE (u4 :User {name:'sally'})
MERGE (i1 :Image {name:'1958.jpg'})
MERGE (i2 :Image {name:'535.jpg'})
MERGE (i3 :Image {name:'35735.jpg'})
MERGE (f)-[:WRITTEN_BY]->(a)
MERGE (f)-[:FROM_COMPANY]->(c)
MERGE (f)-[:MENTIONS]->(u1)
MERGE (f)-[:MENTIONS]->(u2)
MERGE (f)-[:MENTIONS]->(u3)
MERGE (f)-[:MENTIONS]->(u4)
MERGE (f)-[:HAS_IMAGE]->(i1)
MERGE (f)-[:HAS_IMAGE]->(i2)
MERGE (f)-[:HAS_IMAGE]->(i3)
Useful links: data modeling guide and Cypher reference
Sounds on the surface like a graph database problem. If you're going to be walking the edges between users, neo4j or such like may be the one for you.
You might be able to do more generic processing using a document db where every user has an _id of user_id and an array of followers _ids.
Perhaps you could output to MongoDb, then use Neo4j for creating the graph(s) for specialised work, and mongodb for more general work. MapReduce and the aggregation framework in MongoDb are pretty good (speaking from experience, although MapReduce is much more powerful than aggregrtion framework (currently)).
Since the schema is likely to morph, and you do not know what the additional data will be, you might prefer a doc or graph db over a RDB. If you prefer to work in a relational manner at a later point, you can generate csv extracts to upload to your RDBMS of choice after you have defined a schema.
Best Answer
Quick google search showed this result