SQL Server stores all the columns for one row together in a single disk page. (It's more complex than this but for int
and char
etc. this is more-or-less true.) To retrieve any column's value the whole page is read into RAM. So once you have any column available for a given row, all of the columns for that row are available.
There are several buffers / caches involved in responding to a query. One is the great big page bufferpool where pages read from disk are held in memory so execution plans can act on them. And yes, if one query causes a page to be loaded into the bufferpool all subsequent queries will use the same page from the bufferpool and not have to suffer the IO overhead. This is an incidental performance gain, however, and shouldn't be relied upon because SQL Server may choose to evict any page from the bufferpool at any time it needs the space.
Another set of buffering occurs when the output from your query gets sent over your connection to your application. This is (mostly) on a per-connetion basis so duplicating the work duplicates the resources required (more or less).
Another cost is the optimisations you miss out on when you return all columns. Covering indexes and "INCLUDES" columns are meaningless. Every read will have to retrieve the underlying clustered index (or heap) page. It's a fair bet that SQL Server will not be using specific indexes it might otherwise have chosen because a scan-the-cluster plan is cheaper than an index-and-cluster plan.
Enable the slow query log and log_queries_not_using_indexes
on the slave server and should be able to confirm what your gut is telling you. You can take the captured queries, run EXPLAIN
on them, and have some better documentation of your suspicions.
To more directly answer the question, it's absolutely valid to declare indexes on a slave that are not declared on the master. I would suggest that it's a common practice, and one of many reasons why replicas are useful for so many things. Connect to the slave with an account with sufficient privilege, and alter the table directly on the slave.
The only exceptions to this -- which should be intuitively obvious, since the row data needs to be consistent on master and replica -- is that you cannot safely declare a UNIQUE
constraint or a foreign key constraint on a slave when it doesn't exist on the master. This won't work, which is fine, since it doesn't make sense at any rate.
Remember also that indexes in MySQL have names, which are automatically generated if they are not supplied when the index is added. A good practice might be to explicitly name these indexes on the replica so that future indexes added to the master cannot possibly cause an index name collision, which would break replication. My local convention is to prefix the index name with "ix_repl_" which would of course never be used on the master.
ALTER TABLE t1 ADD KEY ix_repl_last_first (last_name,first_name);
Note also that proper duplicate indexes (where exactly the same columns, and no others, are included in more than one index) are deprecated in MySQL 5.6 and disallowed by default in MySQL 5.7, making it apparently possible in later versions to cause replication to stop if you subsequently declare an identical index (even with a different name) on the master. This wouldn't be a critical issue (as long as you're monitoring replication -- you are, right?) since replication would be safe to restart by simply removing the now-redundant index on the replica before restarting the slave SQL thread. The failed event would be retried, and would now be valid, since there's no conflicting index, and the replacement index would be built on the replica.
Side note: remember that MySQL replication requires the version of MySQL on the replica servers to be the same as, or newer than, the version on the master... so when upgrading (or migrating to MariaDB) you will almost certainly want to upgrade the replicas first, and then the master. The reason for this is that a newer replica will understand the capabilities and quirks of an older master, but a newer master may introduce behaviors in the replication stream that an older replica server won't be able to interpret. There are limited exceptions to this rule, but it's most definitely the rule.
Best Answer
Option 1: Change engines
First thing to do, if at all possible, is probably change
MyISAM
byInnoDB
, which handles concurrency much better. See "MyISAM vs InnoDB", specially the point about "Table-locking vs Row-locking". I quote them:[emphasis mine]
Option 2: Master/Slave replication
Another possible (although probably not cheap) solution is to have your database being replicated using an asynchronous Master/Slave setup. [I guess that if you cannot change from MyISAM to InnoDB, it won't be easy to setup a replicated DB... but I don't know either your possibilities nor your limitations.]
Your users (i.e.: your application) should always be working with your Master node, and all
INSERT
s,UPDATE
s andDELETE
s should always and only be executed there. TheSELECT
s can be executed in either node, taking into account that the Slave might be slightly not up-to-date. TheSELECT
s that take a long time should be executed in the Slave, so all the users in the Master node are not blocked at all.The Slave node can not be completely up-to-date (as the replication is asynchronous, it can lag behing the master by anything from a few miliseconds to a long time, if the Slave could not process the updates from the master and they "accumulate").
This kind of scenario is not uncommon in applications where you have the Master node working with a setup optimized for OLTP (OnLine Transaction Processing); and the Slave node with a setup optimized for OLAP (OnLine Analytical Processing). OLTP means (mostly) entering and manipulating data, and making relatively simple queries, with small amounts of data every time, whereas OLAP means querying data, normally in large amounts and with aggregations and computations (i.e.: analysis and reporting).