From the MySQL Documentation
The type of read varies for selects in clauses like INSERT INTO ... SELECT, UPDATE ... (SELECT), and CREATE TABLE ... SELECT that do not specify FOR UPDATE or LOCK IN SHARE MODE:
By default, InnoDB uses stronger locks and the SELECT part acts like READ COMMITTED, where each consistent read, even within the same transaction, sets and reads its own fresh snapshot.
To use a consistent read in such cases, enable the innodb_locks_unsafe_for_binlog option and set the isolation level of the transaction to READ UNCOMMITTED, READ COMMITTED, or REPEATABLE READ (that is, anything other than SERIALIZABLE). In this case, no locks are set on rows read from the selected table.
Evidently, you do not need stronger locks when just doing reads. Picture copying data into a table using INSERT ... SELECT
or CREATE TABLE ... SELECT
. The SELECT
needs to be a frozen snapshot for loading the table. If it were a moving target, that would require a transaction within a transaction. If you wanted transaction within a transaction
behavior, you would need to script that using SAVEPOINT. Otherwise, stronger locks allow for consistent SELECTs for an atomic INSERT.
Doing the simple SELECT
allows your DB Session to trust other transactions not to play smoke-and-mirrors with your view of the data. If it has to violate that trust, it will politely lock every row it can to do its work, rather than deceive DB Sessions that are only doing SELECTs into thinking the data is stable when it is, in fact, changing.
No matter which isolation level you pick, none of them is so granular that a simple SELECT gets held up. That's why SELECT ... FOR UPDATE
and SELECT ... FROM ... LOCK IN SHARE MODE
were invented, so you can be that granular if politeless is not an option.
I wrote about this before : How can I make a select statement get blocked?
You should use SELECT ... LOCK IN SHARE MODE. Why ?
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on any rows that are read. Other sessions can read the rows, but cannot modify them until your transaction commits. If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
In your case, you could attempt this
START TRANSACTION;
SELECT ... LOCK IN SHARE MODE;
SELECT ... INTO OUTFILE;
ROLLBACK;
This would do two SELECT
queries
- First
SELECT
to lock the rows in the table you wish
- Second
SELECT
to perform SELECT ... INTO OUTFILE
Personally, I do not think you have to be this heavy-handed. Transaction isolation should be smart enough to pull off this atomic SELECT
and use the same rows for the INSERT
. I know I said should be
which is why your are asking you question in the first place.
Whether you do SELECT ... INTO OUTFILE
as one command or in the heavy-handed manner I am proposing, the row data of the source table will be fully readable.
GIVE IT A TRY !!!
UPDATE 2014-12-10 15:12 EST
Your comment
Thx for the answer, and it does help, but the main point of the OP was to determine if there is a benefit for using SELECT INTO OUTFILE over INSERT INTO ... SELECT?
They are operationally different
SELECT INTO OUTFILE
creates a text file
INSERT INTO SELECT
loads one table from the results of the SELECT
UPDATE 2014-12-11 12:21 EST
The only thing I can think of in this context is the point-in-time of the data and when you are using it. With both types of operations, there will some implicit shared locking.
With SELECT INTO OUTFILE
, you are preparing a result and saving it externally. Loading that data into a table using LOAD DATA INFILE
will not involve any shared locking during the load process. Keep in mind that SELECT INTO OUTFILE
will incur disk I/O and still impose some caching along the way.
With INSERT INTO SELECT
, the shared locks would probably have to live longer in InnoDB because you are locking rows and using those same rows to INSERT into another table.
Therefore, if I was looking for a performance bonus, I would have give the edge to INSERT INTO SELECT
because you are doing the same amount of shared row locking, The disk I/O for the single operation would have to be less than the separate SELECT INTO OUTFILE
and subsequent LOAD DATA INFILE
. Of course, you would have to compare the two methods against your dataset. What could a performance bonus for one dataset might be a performance cost for another dataset.
UPDATE 2014-12-17 00:00 EST
Your Comment
I wasn't notified that you updated your answer, so I did bounty assuming it wasn't. Your explanation does make sense, in theory, however I am looking for a more authoritative response, in hopes that the overhead of the separate file (as you correctly mentioned) may be worth the trade-off in complexity for improved performance.
The only authoritative response would to come from the MySQL Documentation.
First, what does the MySQL Documentation LOAD DATA INFILE say ?
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. (See Section 13.2.9.1, “SELECT ... INTO Syntax”.) To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, use LOAD DATA INFILE.
Two paragraphs later, it says
For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, see Section 8.2.2.1, “Speed of INSERT Statements”.
When you look at Speed of INSERT Statements, it says this:
To optimize insert speed, combine many small operations into a single
large operation. Ideally, you make a single connection, send the data
for many new rows at once, and delay all index updates and consistency
checking until the very end.
The time required for inserting a row is determined by the following
factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
This does not take into consideration the initial overhead to open
tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N,
assuming B-tree indexes.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements. If you are
adding data to a nonempty table, you can tune the
bulk_insert_buffer_size variable to make data insertion even faster.
See Section 5.1.4, “Server System Variables”.
When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times faster than using INSERT statements. See Section
13.2.6, “LOAD DATA INFILE Syntax”.
Take advantage of the fact that columns have default values. Insert
values explicitly only when the value to be inserted differs from the
default. This reduces the parsing that MySQL must do and improves the
insert speed.
See Section 8.5.4, “Bulk Data Loading for InnoDB Tables” for tips
specific to InnoDB tables.
See Section 8.6.2, “Bulk Data Loading for MyISAM Tables” for tips
specific to MyISAM tables.
Things start to look a little nebulous at this point because you must tune the load process in terms of the storage engine. MyISAM is rather straight forward in this statement because the bulk insert buffer is for MyISAM only and LOAD DATA INFILE will harness the bulk insert buffer. InnoDB will not.
Take a look at this Pictorial Representation of InnoDB (Percona CTO Vadim Tchachenko)
There are other considerations to tweak options but LOAD DATA INFILE
will literally slam everything into the InnoDB Buffer Pool, funneling changes through the Log Buffer, Double Write Buffer, Insert Buffer (if the target table has nonunique indexes), Redo Logs (ib_logfile0,ib_logfile1), and the Physical File of the Table. This is where LOAD DATA INFILE's benefits have to get nullified.
I wrote about this
EPILOGUE
As I already said in my previous update to this answer
Therefore, if I was looking for a performance bonus, I would have give the edge to INSERT INTO SELECT
because you are doing the same amount of shared row locking, The disk I/O for the single operation would have to be less than the separate SELECT INTO OUTFILE
and subsequent LOAD DATA INFILE
. Of course, you would have to compare the two methods against your dataset. What could a performance bonus for one dataset might be a performance cost for another dataset.
Basically, you will have to test SELECT INTO OUTFILE/LOAD DATA INFILE
against INSERT INTO SELECT
. It may be 6 of one, half a dozen of the other for one dataset, and a landside victory for another dataset.
All being said from the MySQL Docs and my past posts, I still give the edge to INSERT INTO SELECT
. You will just have to test the two methods.
Best Answer
No, MySQL builds a temporary result set from your query before performing the insert.
Source: MySQL Documentation
Edit based on Comments
The insert is an atomic operation and so the data used to build the temporary result set will match the committed data in the source table at the precise moment that the query was executed (i.e. it is a snapshot of that point in time).
After the temporary result set has been built any modifications to the source table will not be reflected in the results being inserted into the target table - unless you run the query again.