Here is one thing that caught my eye when you replied to my comment: The target table is InnoDB and you are using LOAD DATA INFILE. I see two issues
While LOAD DATA INFILE can load InnoDB tables, that command can be tuned for loading MyYSAM tables. There is only one option to do this: bulk_insert_buffer_size. either setting to very large or setting it to zero to disable it.
There is no synonymous provision for InnoDB.
ISSUE #2 : InnoDB Storage Engine
Let's take a look at the InnoDB Architecture
Now, picture yourself pushing 50 millions rows into one InnoDB table as a single transaction giving all the plumbing depicted in this elaborate illustration.
To ensure data consistency in the event of a crash, your data has to be written in three places:
- There are 128 rollback segments in the System Tablespace (Physical File ibdata1). Your incoming table data must pile up on one Rollback Segment like defensive tackles on a quarterback.
- You have an active Double Write Buffer in the System Tablespace. As the name implies, data is being written twice. InnoDB will write to the Double Write Buffer first before writing back to the
.ibd
files. Those data are used as source data for crash recovery.
- The data are also being written in the Transaction Logs (Redo Logs in the Bottom Right Corner) via the Log Buffer. The physical files are ib_logfile0 and ib_logfile1.
My Perspective
InnoDB can handle 1024 current transactions but there are only 128 rollback segments. If there are other transactions going on, you got a New York City traffic jam on your hands. With all the InnoDB Internals to manage through your bulk insert, seeing NULL in the processlist should not be a surprise. You should look at four(4) things to make sure they are up-to-date:
- filesize of ibdata1
- timestamp on ibdata1
- timestamp on ib_logfile0
- timestamp on ib_logfile1
SUGGESTIONS
You could set one or more of the following
GIVE IT A TRY !!!
Face it, wherever you have the UUID as a KEY, it will be in a fragmented BTree. But, the BTree is kept reasonably clean. That is, when a BTree block is too full to accept another row, it splits into two blocks, each about half full. As time goes on, any new inserts into either of those blocks will simply add to the blocks without immediately splitting. The end result of 'random' inserts is blocks that average about 69% full. This is only slightly worse than 100% full.
InnoDB uses only BTrees. The data is 'clustered' with the PRIMARY KEY. There is always a PRIMARY KEY:
- an explicit PK, or
- the first UNIQUE key with non-null column(s), or
- a fabricated, hidden, 6-byte PK.
If your UUID is the PK, then you are doing splits/fragmentation on the data. If your UUID is a secondary key, then that BTree suffers from splits/fragmentation. There is (almost) no escaping it.
I say 'almost' because if you are using Type-1 UUIDs, you can shuffle the bits to make them approximately time oriented. This makes them much like AUTO_INCREMENT
ids. I discuss that in my blog.
Yes, you could rebuild whatever index contains the UUID. This is via the OPTIMIZE TABLE
, which rebuilds the table, blocking access during the process. And, as I say, you won't gain much from it. Ordinary blocks splits are not costly; OPTIMIZE
is. I often tell people to 'never' use OPTIMIZE
.
Is your entire table small enough to be cached in the buffer_pool? If so, UUIDs are not too much of a performance problem. On the other hand, once the table (or the UUID index) gets much bigger than the buffer pool, processing becomes I/O-bound. This is because of the random nature of UUIDs. As the table grows, each INSERT and each SELECT (using UUID) becomes more and more likely to require a disk hit.
If you cannot avoid UUIDs (which would be my first recommendation), you can at least shrink them from taking 110 bytes via a casual implementation (VARCHAR(36)
w/utf8) or 36 bytes (CHAR(36)
w/ascii) and shrink it down to 16 bytes (BINARY(16)
). See my blog for the pair of Stored Functions. Smaller --> More cacheable --> Less I/O --> Faster.
If you would like to discuss further what you are doing, I would be happy to elaborate further.
Best Answer
You can create the pair of triggers
ON INSERT
andON UPDATE
that perform any desired transformations 'on the fly'When you run some INSERT or UPDATE query on the
table
trigger will replace submitted value of theuuid
by string stored in theloooonguuid
variable.Sure you can do as complex and randomized transformation as you want.