How does one scale out with master-slave replication

database-designmaster-slave-replicationreplication

To my understanding, a master-slave replication might be set up where clients can read from multiple databases. However, they can only write to the master. I believe this helps guarantee ACIDity.

I also understand one-way replication seems generally preferred to two-way for the same reasons.

I don't quite understand how this would work in terms of application development though.

  1. How do you tell an application to write to one database and read from another? Does this mean this actually has to be programmed into the application layer?

    This isn't too big a deal. However, let's say it takes 60 seconds for that write to come back down to the local database. How does the client compensate for latency when he reloads the page and it re-queries the local db (missing that write)? This seems so error-prone!

  2. How would this then scale? Understood that reads dominate a majority of a database's work. But what happens when you scale enough that one server can't handle all the writes? What options do you have?

I'm trying to wrap my head around the subject but I'm missing answers to these basic questions.

Best Answer

I have to keep my answer very generic as you have not mentioned any particular database system and what type of master slave implementation you want to know about.

However, they can only write to the master.

True in most cases but there are solutions where you can write to more than one target at the same time. Your data will be replicated to both nodes, synchronized and conflicts will be resolved.

Oracle has multi master implementation. Generic idea explained here.

SQL Server has merge and peer to peer replication. In merge replication where you can update on the subscriber side and data will be merged between two. Details here. In peer to peer replication there is no conflict resolution compare to merge replication. Therefore it is recommended only in configurations where only one copy of a given data element is updated. For example, if Peer-to-Peer replication is used to maintain the inventory of a chain of food stores, only the store that owns an inventory entry would be allowed to update that entry. Details here.

I also understand one-way replication seems generally preferred to two-way for the same reasons.

I would not say it is preferred but definitely used less compare to one way replication. It will depend on your need. Main challenge with two way replication is conflict between 2 or more masters. Conflicts might occur and, when they do, you need the ability to detect and resolve them.

How do you tell an application to write to one database and read from another? Does this mean this actually has to be programmed into the application layer?

Depends. As an example SQL Server Always On Availability Group you do it by configuring your secondaries available to read by clients. Then you set up your connection string in such a way that your reads will be automatically routed to read_only nodes. Details here.

Whereas in MongoDB config server keeps track of primary and secondary. Each Mongos has this information. Mongos will redirect the traffic to the correct node based on your Read Preference setting. Details here.

How does the client compensate for latency when he reloads the page and it re-queries the local db (missing that write)? This seems so error-prone!

In SQL Server this is dictated by the setting of isolation level and implemented by lock/latch on different objects (database, table, page, row etc). This is a good reading about this topic. I do not know the details for other DBMS, but I am sure it will be something similar.

How would this then scale? Understood that reads dominate a majority of a database's work. But what happens when you scale enough that one server can't handle all the writes? What options do you have?

In general by Sharding your objects horizontally. Sharding can be implemented in may different ways. I worked with MongoDB sharding for many years and it works perfectly. Details here. Some scaling out idea using SQL server explained here.