MySQL indexing on 11 million rows

indexMySQLmysql-5.5temporary-tables

I have a development and production server, both with different amounts of RAM. Development machine has 8G, while production has 64G.

I'm importing a large table from development to production, but before I started the process I dropped all the indexes to speed the import process, because I'm using stored procedure and with indexes it was very very slow.

On the development machine, while I was playing with creating/dropping indexes, I noticed that mysql used temporary space on the disk (default /tmp) for creating the indexes and it got to almost 8G.

The question is, since the production server has 64G of ram, will the mysql use the RAM for storing the temporary table while creating the indexes or will it use /tmp for the same thing?

Best Answer

What "import" mechanism was used? Some will automatically postpone the secondary-index creation for you.

I don't understand "I'm using stored procedure and with indexes it was very very slow". SPs being used for the import? Or indexes interfering with SPs? What?

Is /tmp on the same disk and in the same disk 'partition'? This is critical when it needs to "move" the result to the mysql directory.

You really need to upgrade -- 5.6 has some indexing improvements; 5.7 and 8.0 have more.

MySQL uses a combination of RAM and disk -- trying to minimize I/O. So, the question of "will it use RAM" is moot. It will "do what is optimal".

To create a secondary, non-UNIQUE, BTree index, MySQL probably

  1. reads the entire data, writing the secondary keys and primary keys to a file
  2. sorts that file (OS file sort is quite efficient)
  3. reads the file and builds the BTree for the INDEX; this will involve writing to disk.

However. "read" and "write" and "file" are cached in based on InnoDB's whim and the OS's whim. Furthermore, step 3 may delay the actual write to disk of the BTree.

UNIQUE requires a check; FULLTEXT and SPATIAL were not available in InnoDB in 5.5. If you are using Engine=MyISAM, some of what I have said is incorrect.

A related question -- Is that OS sort faster than inserting the rows one at a time and building the index on the fly? The answer is "probably". On-the-fly requires keeping index entries in the in-RAM key_buffer or buffer_pool, plus flushing such as it goes along. That gets into another long discussion.

More

The main part of ALTER TABLE (at least on 5.5 and before) involves reading the entire table and (depending on the specific Alter) rewriting the data BTree and/or writing index BTree(s).

If tmpdir refers to a directory that is on the same disk partition as the main mysql/ directory then various ALTER TABLEs can "move" a temp file (which such is required) from tmpdir to mysql/databasename/ at the end of the Alter. The "move" is accomplished (at least on Linux) by a couple of quick directory tasks. If different disk partitions are needed, a physical "copy" is needed; this requires a big read and write.

A file sort is likely to be used as one step in building an index. How this is done is up to the OS. But it is equivalent to reading and writing the data being sorted (the index's BTree) one or more times (depending on number of rows in table). For a tiny table, the OS will effectively cache the effort in RAM. For a huge table multiple "sort-merge" passes will be required.

Sorting is "O(N*logN)". That is, if you double the number of items, it takes more than twice as long to complete the sort. (However, I/O dominates the time consumed, so one can estimate time very precisely.)

If the server is configured with multiple not-big-enough physical disks, the above should help with understanding that part of the Alter.

It is optimal (for MySQL) to give most of RAM to MySQL. But, when sorting is necessary, the OS needs some room, too. The tradeoff here is beyond my understanding of the details.