The issue with storing your database in memory is if you have any sort of memoery issue or server has to be restarted or anything of that issue all your memory will get flushed.
That is the reason people don't store their database in memory.
Now, there are caching tools which are in-memory and can work as a very simple database like memcached. That may meet your needs. If you look in to tmpfs and ramfs you can create a folder that exists in memory and move your files in there normally.
So, if you are working with MongoDB, mysql or whatever you work with, you can have the data folder live in the RAM folder. This will give the database super fast read and writes. Everything will be really fast. You will be limited to how much RAM you have minus the size of your OS and other things running.
Also, just be careful: MongoDB likes to store writes in memory until the disk has a chance to write, it so you may want to turn that feature off because it will be the same speed.
My recommendation is to work with memcached and then mix it with a normal database that lives on disk. The concept is done with PHP sessions on some systems.
http://mickeyben.com/2009/12/30/using-nginx-as-a-load-balancer.html
The basic way it works is, if your record is found in memcached, then it will not check the database. If it is not found, then do three(3) things:
- check the db
- send the data to memcached
- send the data to calling function
:)
Read/Write splitting doesn't move all reads to a replica server, it only moves the reads away from the master that are sensible to move... so any reads, for example, done by your processes that are integrating third party data would remain on the master, along with any reads related to transactions, but many other reads, such as report generation or compiling of aggregate/summary views could probably be offloaded to another machine.
How much this would help vs. how complicated it is to implement is very situation-specific. The 85%/15% split you're observing seems promising, but it doesn't tell us how many of the reads are directly related to the writes and thus need to stick with the master server.
Another concern with read/write splitting is the read-after-write consistency seen by your users. If I'm browsing my bank account transactions, that information could easily be read from a replica; on the other hand, if I just did a transfer between accounts, then go back to look at my transactions, the safest thing for the system to do is to read my transactions from the master to ensure that what I just did is immediately visible to me. Again, the nature of your application is the driving factor in determining the implications of this.
Neither of these points should be taken as advice "against" read/write splitting. You almost certainly should do some of it... the questions are "how much can you safely do?" followed by "how much will that help?"
When you are integrating the third-party data, something you want to keep in mind is that ideally, database transactions should generally be as quick as possible and as small as possible. From what you've described about using transactions to ensure consistency in the event of a lost connection from a third-party data source, it sounds like you might have room for improvement, here. The best bet -- probably -- is to retrieve the data from the external source, save it locally, verify its integrity, and then proceed with the integration.
If a large portion of your load is machine-generated, you probably also have options of making those processes less aggressive in their actions, more deterministic in their scheduling, and more cooperative with each other. Some of those processes, possibly, could be staged on another server, the calculations done, and the final result sent to your master server... but like everything else, it depends on the nature of the processes.
Best Answer
This question is really far too vague to answer effectively. There are dozens of "NoSQL" data stores out there which have various use cases. Here is a 10,000 foot view of what's out there.
In my mind, there are basically 3 main categories of NoSQL data stores commonly used, key/value stores, document databases, and big data (hadoop). This is a somewhat artificial designation and many of these products can arguably cross into multiple areas. There are some other categories, such as graph databases, which are more specialized towards a specific problem and I am not going to discuss them here as I have no expertise about them.
Most NoSQL databases are simple key/value stores which are very fast when retrieving named keys. They are particularly inefficient if you need to scan or aggregate over sets of data. Examples of k/v stores are memcached, Riak, Redis, CouchBase, Voldemort, and Amazon DynamoDB. With the HandlerSocket plugin (built into Percona Server), even MySQL can be used as a very fast k/v store. Each of these k/v stores have different feature sets designed to solve different problems. Very few of them are suitable as the authoritative/primary data store for an application because of how difficult and inefficient it is to perform set operations. These are mostly used as caching layers or storing very specialized data that does not require relational operations.
Another general class of NoSQL databases are document stores. Examples include MongoDB and Cassandra. These types of data stores store more structured data than k/v stores and often have a more capable query language. They have flexible "schemas" that make it possible to keep completely different sets of data from one row to the next.
Finally, you get to the true "Big Data" stores of which Hadoop and it's related query languages, pig, hive (a SQL interface to hadoop), and hbase (a real-time data store on top of hadoop/HDFS) is king. With the exception of hbase, Hadoop-based data stores tend to be built for offline processing of truly enormous data sets across hundreds of machines.
As a side note, what drives me absolutely batty about "NoSQL" is that it has literally nothing to do with the SQL language. NoSQL is about reinventing the data storage layer and making it more "scalable" (another vague, misunderstood term) and highly available. The query language is irrelevant in most cases and some of these data stores have produced just horribly ugly ways of performing even the most simple operations. SQL could be used as the access language to most these data stores had the developers made that choice - take a look at VoltDB, MySQL Cluster, or Hive for examples of distributed SQL databases that have "NoSQL" features. When treated like a key/value store, MySQL with InnoDB is actually incredibly fast at primary key lookups (
SELECT value FROM table WHERE key = ?
) and it would be relatively easy to create a client library that creates a consistent hashing scheme to build a distributed MySQL cluster as one would use Riak, Redis, or memcached.Bottom line is that you'll have to be more explicit about your needs if you want any more detail than that. Here are just a small subset of the questions that you will need to answer in order to even limit the field:
I hope that this helps you a bit in your research.