I haven't tested this (as my tables usually have at least a primary key) but I expect the difference to depend on the choice of the primary key.
Based on the information in MySQL documentation about InnoDB engine, all InnoDB tables have a clustered index. This is the PRIMARY
key of the table and in lack of one, the first UNIQUE
index. And in lack of unique keys as well, a hidden column is created (and values are auto-generated for it) and used internally for unique identification and for clustering.
The "key" here is that this hidden column is a 6-byte integer column. So, you have 6 bytes per row overhead when you don't define primary and any unique index in an InnoDB table.
In contrast, if you have a narrower column (like a 1, 2, 3 or 4 byte integer auto-incrementing column) defined as primary or unique index , you should expect an improvement on INSERT
efficiency.
If you define an 8-byte integer I'd expect a decrease on efficiency.
If it is a VARCHAR(20)
or a wider combination of column, the decrease would probably be worse, depending on the order you provide the PK values. Auto-incrementing values are by definition increasing (so good for a clustered index). If you provide non-increasing values, that will affect the INSERT
efficienct negatively.
This sounds very unusual for a table using the ARCHIVE Storage Engine. Why? A duplicate key error is not characteristic for ARCHIVE Storage Engine since
- Engine does not support the creation of indexes
- Engine supports INSERTs and SELECTs
Surprisingly, there can be a key internally present. How?
According to the MySQL Documentation
The ARCHIVE engine supports the AUTO_INCREMENT column attribute. The AUTO_INCREMENT column can have either a unique or nonunique index. Attempting to create an index on any other column results in an error. The ARCHIVE engine also supports the AUTO_INCREMENT table option in CREATE TABLE and ALTER TABLE statements to specify the initial sequence value for a new table or reset the sequence value for an existing table, respectively.
Given this information, look back at the table and the query
insert into test_table (id,arch) values (123,'FILE_CONTENT')
If the id
column has the AUTO_INCREMENT
attribute, you should not specify id
with a value. It would produce a normal 1062 error (Duplicate Key) for other Storage Engines.
SUGGESTIONS
Change the insert
to a format that can handle the AUTO_INCREMENT attribute of id
insert into test_table (id,arch) values (0,'FILE_CONTENT')
or
insert into test_table (arch) values ('FILE_CONTENT')
Give it a Try !!!
UPDATE 2013-08-06 16:57 EST
If you are planning to do queries from the archive table, you need to get away from the ARCHIVE Storage Engine. Why? Again, according to the MySQL Documentation
Retrieval: On retrieval, rows are uncompressed on demand; there is no row cache. A SELECT operation performs a complete table scan: When a SELECT occurs, it finds out how many rows are currently available and reads that number of rows. SELECT is performed as a consistent read. Note that lots of SELECT statements during insertion can deteriorate the compression, unless only bulk or delayed inserts are used
Note that every SELECT against an ARCHIVE table is a full table scan. If you lookup id 123 with 1,000,000 rows, you gotta read 1,000,000 rows every time.
SUGGESTION
Convert the table to MyISAM. Then you can have a proper index on id
plus the ability to create other indexes on other columns as needed.
CREATE TABLE test_table_myisam ENGINE=MyISAM as SELECT * FROM test_table WHERE 1=2;
ALTER TABLE test_table_myisam ADD PRIMARY KEY (id);
INSERT IGNORE INTO test_table_myisam SELECT * FROM test_table;
DROP TABLE test_table;
ALTER TABLE test_table_myisam RENAME test_table;
Best Answer
Using REPEATABLE-READ means that your SELECT transaction gets to view a "snapshot" of the state of data as of the moment the transaction started. Even if other sessions subsequently insert more data, the SELECT transaction will not see them. That's the way it's supposed to work.
So your query is returning 100% accurate results, with respect to the data that existed at a specific point in the sequence of INSERTs. But you are rapidly changing the data, so your SELECT is bound to return an outdated report.
Using another transaction isolation level probably won't help. InnoDB refreshes a transaction's snapshot only at the beginning of a statement. In other words, if the SELECT takes 0.2 seconds, and during that time another 100 INSERTs happen, those 100 INSERTs won't be included in the result of the SELECT.
http://dev.mysql.com/doc/refman/5.6/en/innodb-consistent-read.html says:
The only way you can ensure your SELECT is totally in sync with the latest committed data is to prevent any new data from committing while the SELECT is executing. You can lock the tables temporarily with
SELECT ... LOCK IN SHARE MODE
. See http://dev.mysql.com/doc/refman/5.6/en/innodb-locking-reads.htmlBut even if you do this, as soon as you finish that transaction, the lock is released and new INSERTs start committing, making the result of your SELECT out of date almost instantly.