Mysql – Graph Database with “non-graph” data

graphMySQL

— Update —
Thanks for the comments and the help so far. And I apologize for not specifying the question further. I have updated the question below.
— Update —

Currently I am asked to develop a database structure for a pretty big amount of data. I am debating between implementing a graph database instead of a "normal" relation database, and was wondering what the disadvantages are if the data does not necessarily contain any relationships? Could separate, unconnected, nodes be utilized the same way as a table / row in a table?

I am asking this because as of right now there are is no need for relationships but I am trying to future proof the database (with the anticipation of relationships) to expand the abilities of the data. If it is of any help, I am debating between OrientDB/Neo4j or mySQL/postgreSQL.

An example:
Let say we have a database full of stocks. Stock can be bought and sold at pretty much any time / day by anyone (as long as the market is open). Now this database could be a normal relation database: Table 1: IDs | Products | Prices | Sizes | Dates. But could potentially also be organized as a relationship database Node 1: Stock A | Node 2: Stock B.
If I just use the database for storing stock information, it seems to me a normal database will be better. But is this true? Will it negatively affect me working with a relationship database / will I be better off working with a normal database? Are there disadvantage to organize my data in nodes, instead of rows?

A picture to illustrate it all:

The database will mainly be used for products bought and sold similar like stocks, however they also have other information such as locations attached to them. I am trying to anticipate the implementation of features such as recommending products or even the prediction of next products to be bought by somebody.

Most of the queries from the database will be per product.
Pulled from the database: 100 to 1000 times a day.
Pushed to the database: 20.000 a day.

An added question which might reveal a disadvantage:
How easy will it be to go from relation-to-graph or from graph-to-relation. Any lock-in dangerous?

Thanks for all the help, comments have been great so far!
R

Best Answer

I think you are making a fairly common mistake here by selecting NoSQL technologies in order to be "future-proof" without understandng the tradeoffs. If in doubt start with PostgreSQL and figure you can always set up a graph db or other NoSQL db on the side as you need it. Also you can do graph traversal in PostgreSQL but keep in mind you are working with sets and so that's very, very different from a graph db with different benefits and drawbacks.

Your basic tradeoff is between flexibility of data entry and an ability to flexibly utilize data on output. NoSQL db's (including graph db's) sacrifice the latter for the former and because very often you need an ability to run ad hoc queries on sets of data you probably want an RDBMS somewhere. This means if you have a good data model, you should be able to add adjunct db servers to your environment for special purposes of your RDBMS won't handle it.

Nonetheless, there are a couple specific reasons why if you are thinking at some point of adding various NoSQL solutions why I would recommend starting with PostgreSQL instead of MySQL and these have to do with performance of NoSQL-friendly database structures and the fact that PostgreSQL supports WITH RECURSIVE common table expressions. This approach (WITH RECURSIVE CTE's) allows things like hierarchies and graphs to be traversed using recursive SQL statements which can repeatedly traverse paths. These are also relatively efficient for a set-based approach, and so if you end up with a lot of part/subpart modelling problems you may be able to do that directly in PostgreSQL without too much work. If this doesn't work however, you can also more easily represent the data in a way which is easier to import into your NoSQL db. So it gets you further and it integrates better. You may even be able to set up foreign "tables" against your NoSQL db and run queries against it from inside Pg!

More on the Tradeoff 1

Graph databases arose from a need to do binary traversal quickly over large sets of information. The obvious example might be a social network site like LinkedIn which might want to tell you quickly how far removed you are from another user (in simple terms this means essentially they are designed to play "six degrees of Kevin Bacon"). Typically on a graph database you are essentially traversing nodes using what are often described as "three word sentences" which represent the graphs. In this regard you are looking at essentially something like "John friended Jane" and traversing this way. Typically the API is relatively navigational.

A relational database is really designed to work with sets of information. Typically when giving up on relational databases, one is giving up on set operations too. This can be a big deal. Often it is faster both operationally and in terms of development time to be able to do set operations rather than navigate, aggregate, and report. This is a huge difference and if your use case fits a relational workflow, I have trouble imagining that a graph db will help you.

More on the tradeoff 2: Consistency Models

A second thing to investigate and keep in mind is the question of what consistency model your database uses. Even with "ACID-Compliant" RDBMS's there is a range of consistency models based on different levels of transaction isolation that can be used to prevent problems. If your data is important, the standard RDBMS consistency models are going to fare much better than most of the NoSQL models because they make more guarantees both to the DBA and to the application. Graph databases are a relatively large field, and each vendor may have some differences in their consistency model. This isn't necessarily something that will rule the use out, but it is something to be cautious about when looking at the solutions as a whole.

Note that schema flexibility is both a blessing and a curse in the NoSQL world and this also affects graph databases. It is a blessing because you can get up and running in the initial stages faster, but it is a curse because the set operation strengths of an RDBMS depend on a fixed schema, and those that relax that to some extent (Informix, for example, supports jagged rows in return sets) require programmer knowledge of the places where those requirements have been relaxed.

Related Solutions

Graph database with Erlang driver

I don't see any better hope of answering this, but you can use Neo4J from Erlang... see https://github.com/nerlo/nerlo/wiki/howto-use-neo4j-from-erlang

MySQL – How to Store Arrays in a Table

You are duplicating the common data on multiple rows which can lead to consistency issues. Look up "normal form" for a discussion about the standard ways to deal with this and help ensure your data is consistent and efficient. Read around database normalisation for more about this.

To break that duplicated data out you could do something like:

                    Message             MessageData
Client              ==============      ================
==============      id_msg    (PK) <--- id_msg (FK) (PK)
id_client (PK) <--- id_client (FK)      key         (PK)
client_name         time_stamp          value
contact_no

You potentially have a second problem though: the message data table presented above is still a property bag or Entity-Attribute-Value (EAV) model which while flexible can be problematical in a number of ways. You'll find many discussions online (including here on DBA.SE) and in print (Bill Karwin's excellent "SQL Anitpatterns" has a relevant chapter) about the pros and cons of EAV structures so I'll not delve into more detail here.

If those array keys in your JSON example are fixed then a better layout might be:

                    Message        
Client              ============== 
==============      id_msg    (PK) 
id_client (PK) <--- id_client (FK) 
client_name         time_stamp     
contact_no          temperature
                    humidity
                    battery
                    cnt1

Or if those keys are common in the data but there are potentially others that and you need to allow any key/value pair, you can use a hybrid of fixed columns and property bag:

                    Message             OtherMessageData
Client              ==============      ================
==============      id_msg    (PK) <--- id_msg (FK) (PK)
id_client (PK) <--- id_client (FK)      key         (PK)
client_name         time_stamp          value
contact_no          temperature
                    humidity
                    battery

7.300.000 rows per year

With appropriate table and index design 7.3 million rows is not a lot at all.

Best Answer

Related Solutions

Graph database with Erlang driver

MySQL – How to Store Arrays in a Table

Related Question