We have a table with 600M+ records and using AWS Aurora with master-slave replication.
Requirement:
We need to add two columns with index in customer
table.
Aurora Instance Specs:
Ram: 30.5 GB
CPU: 4 vCPU
I replicated the cluster and created instance with same configuration and ran the update process to newly created master instance to see how it works. But it's taking a lot time; ( already been 4 days ) and still running. Is there anything I can do to speedup the alter process?
Command:
pt-online-schema-change --execute --ask-pass --user=admin --statistics --chunk-time=1 --alter 'ADD COLUMN email_sha1 CHAR(40) DEFAULT null, ADD COLUMN email_sha256 CHAR(64) DEFAULT null, ADD INDEX email_sha1(email_sha1) USING BTREE, ADD INDEX email_sha256(email_sha256) USING BTREE' D=example,t=customer,h=example.public.rds.amazonaws.com
Best Answer
You are taking a hash of a phone number? Why? Indexing the phone number is arguably better than building a hash and indexing that. For one thing, the phone number is always(?) shorter than the hash.
Kill the
ALTER
and rethink the task.The likely reason for the problem is the randomness of the column being indexed -- Toward the end of the process, it will be slowing down to about 1 disk hit per row. If you estimate 1ms per disk hit, I would estimate that it might take a week to finish.
What version are you running? Can it add an index with
ALTER=INPLACE
? That might let you (1) add the columns, then (2) in a separateALTER
do theADD INDEX
withALTER=INPLACE
, and not have to worry about it taking a week.Still I question the need for the columns or the index.
If you need to discuss further, please provide
SHOW CREATE TABLE
.