For starters, I would not touch the buffer sizes just yet. The sizes youhave in the question are monstrously too big.
Here is another observation: You have BLOB data. Ouch, your temp table is going to eat space rather quickly. You could do somehting like this:
Create a 32GB RAM Disk called /var/tmpfs by adding this line to /etc/fstab
none /var/tmpfs tmpfs defaults,size=32g 1 2
Next, create a folder called /mysqltmp and mount the RAM disk on it
mkdir /mysqltmp
chown mysql:mysql /mysqltmp
mount /mysqltmp /var/tmpfs
Add this to my.cnf and restart mysql
[mysqld]
tmpdir=/mysqltmp
Now, any tmp table made via DDL lands in the RAM disk.
Here is yet another observation: Why not create a separate table that keeps the BLOB data away from the unique names?
CREATE TABLE `data_store_name` SELECT id,uniqname FROM `data_store` WHERE 1=2;
ALTER TABLE `data_store_name` ADD PRIMARY KEY (id);
ALTER TABLE `data_store_name` ADD UNIQUE KEY (uniqname);
ALTER TABLE `data_store_name` ADD INDEX name_id_ndx (uniqname,id);
INSERT INTO `data_store_name` SELECT id,uniqname FROM `data_store`;
This will prevent any moving around of BLOB data when indexing.
From here, you would have to always join data_store using its name like this:
SELECT
A.uniqname,B.data
FROM
(SELECT * FROM data_store_name WHERE uniqname = 'mydataname') A
LEFT JOIN
data_store B USING (id)
;
Making these changes will sidestep this whole mess of dealing with keycache, RAM disks, and tmp tables.
Give it a Try !!!
The first thing that comes to mind is the Server Model : db.m1.large
What limiting factors are placed on a MySQL RDS?
If you spin up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects
- max_connections
- innodb_buffer_pool_size
Here is a Chart I posted Last Month
MODEL max_connections innodb_buffer_pool_size
--------- --------------- -----------------------
t1.micro 34 326107136 ( 311M)
m1-small 125 1179648000 ( 1125M, 1.097G)
m1-large 623 5882511360 ( 5610M, 5.479G)
m1-xlarge 1263 11922309120 (11370M, 11.103G)
m2-xlarge 1441 13605273600 (12975M, 12.671G)
m2-2xlarge 2900 27367833600 (26100M, 25.488G)
m2-4xlarge 5816 54892953600 (52350M, 51.123G)
I posted this in my other posts in the DBA StackExchange
You are using m1.large. Since the InnoDB Buffer Pool is 3/4 of the Instance. That's means you have 7.2G but only 1.8 is usable. That model is capable of having up to 623 connections. Each connection can consume memory because of
- join buffers
- sort buffers
- read buffers
- thread info
Amazon RDS is simply micromanaging resources. Since DB Connections can consume RAM, connections are probably being disallowed due to the lack of RAM needed.
SUGGESTIONS
- Try reducing InnoDB log IO
innodb_flush_log_at_trx_commit=0
during the mass creation
- Make sure you are not doing large transactions during any automatic backups or snapshots
- Try a bigger server model
m1.xlarge
or m2.xlarge
Best Answer
What "import" mechanism was used? Some will automatically postpone the secondary-index creation for you.
I don't understand "I'm using stored procedure and with indexes it was very very slow". SPs being used for the import? Or indexes interfering with SPs? What?
Is /tmp on the same disk and in the same disk 'partition'? This is critical when it needs to "move" the result to the mysql directory.
You really need to upgrade -- 5.6 has some indexing improvements; 5.7 and 8.0 have more.
MySQL uses a combination of RAM and disk -- trying to minimize I/O. So, the question of "will it use RAM" is moot. It will "do what is optimal".
To create a secondary, non-UNIQUE, BTree index, MySQL probably
However. "read" and "write" and "file" are cached in based on InnoDB's whim and the OS's whim. Furthermore, step 3 may delay the actual write to disk of the BTree.
UNIQUE
requires a check;FULLTEXT
andSPATIAL
were not available in InnoDB in 5.5. If you are using Engine=MyISAM, some of what I have said is incorrect.A related question -- Is that OS sort faster than inserting the rows one at a time and building the index on the fly? The answer is "probably". On-the-fly requires keeping index entries in the in-RAM key_buffer or buffer_pool, plus flushing such as it goes along. That gets into another long discussion.
More
The main part of
ALTER TABLE
(at least on 5.5 and before) involves reading the entire table and (depending on the specific Alter) rewriting the data BTree and/or writing index BTree(s).If
tmpdir
refers to a directory that is on the same disk partition as the mainmysql/
directory then variousALTER TABLEs
can "move" a temp file (which such is required) fromtmpdir
tomysql/databasename/
at the end of the Alter. The "move" is accomplished (at least on Linux) by a couple of quick directory tasks. If different disk partitions are needed, a physical "copy" is needed; this requires a big read and write.A file sort is likely to be used as one step in building an index. How this is done is up to the OS. But it is equivalent to reading and writing the data being sorted (the index's BTree) one or more times (depending on number of rows in table). For a tiny table, the OS will effectively cache the effort in RAM. For a huge table multiple "sort-merge" passes will be required.
Sorting is "O(N*logN)". That is, if you double the number of items, it takes more than twice as long to complete the sort. (However, I/O dominates the time consumed, so one can estimate time very precisely.)
If the server is configured with multiple not-big-enough physical disks, the above should help with understanding that part of the Alter.
It is optimal (for MySQL) to give most of RAM to MySQL. But, when sorting is necessary, the OS needs some room, too. The tradeoff here is beyond my understanding of the details.