MySQL performance issues when saving Bitcoin data

MySQL

I have set a cron job to a php script which periodically checks Bitcoin for new blocks, and if there are any, add the blocks and transactions to a database. After a few million records have been added, I have noticed that the table I am saving transactions to is suffering an incredible slowdown. This issue had to do with not having set up my search term (in this case address affected and/or txid) as an index, after adding the index it has become considerably faster.

This leads me to believe that there may be some best practices I am unaware of when saving Bitcoin data to a database, possibly breaking tables down to a set size, so that I can search my database later at optimal speeds.

My question is this: When saving a large data set to MySQL that constantly gets updated, such as the Bitcoin data, what do I need to do to improve the performance for database lookups?

My primary goal is to be able to retrieve data as quickly as possible. Thank you! 🙂

Edit: Here is the table structure before adding indices.

vin (Hex TEXT, Txid VARCHAR(64), Version VARCHAR(1), Blockhash VARCHAR(64), Locktime VARCHAR(1), Time VARCHAR(10), Vincoinbase BLOB, Vintxid VARCHAR(64), Vinvout VARCHAR(4), Vinscriptsigasm TEXT, Vinscriptsighex TEXT, Vinsequence VARCHAR(12))

vout (Hex TEXT, Txid VARCHAR(64), Version VARCHAR(1), Blockhash VARCHAR(64), Locktime VARCHAR(1), Time VARCHAR(10), Voutvalue DECIMAL(20,8), Voutn VARCHAR(4), Voutscriptpubkeyasm TEXT, Voutscriptpubkeyhex TEXT, Voutscriptpubkeyreqsigs VARCHAR(4), Voutscriptpubkeytype VARCHAR(10), Voutscriptpubkeyaddresses TEXT)

Best Answer

I am not aware of any Bitcoin-specific best database practices. I also see no need for them, as just general good database design will help you.

The key to achieving decent database performance is to define indices in a way that every access you every need can take advantage of them. Where you do not succeed in this, just about any query will revert to iterating over every record in the table, which obviously becomes slow when you reach gigabytes of stored data.

A good place to start will be to define a (ideally integer, i.e. BIGINT) unique index, also known as primary key. If you do not do that yourself, mysql will just invent one anyways---and if you do, you automatically have a fast way to refer to transactions (in database parlance: records in the vin and vout tables). If you invent a new field for it---as I would suggest---you may want to call it id and use AUTO_INCREMENT to have mysql take care of inventing new unique values for new records.

You may also want to have a close look at your VARCHAR entries. These are essentially just (very big) integers, and if you define them as such (check out the DECIMAL and NUMERIC types), handling indices on them becomes computationally much cheaper because complications in string sorting such as collation do not arise. If you prefer to work with the hexadecimal representations, then that can be achieved with suitable VIEWs into your tables, but be warned that using these is a frequent source of new performance problems since optimization to index-usage after querying a view is a complicated subject. To get this right, it may be safer to do all conversions between hex and large-integer representation outside the database.

To increase your chances at getting better and more detailled answers than I have just given, I very strongly suggest you

  1. Look to a developer or database community, because none of the relevant issues are very Bitcoin-specific, at best they are specific to database applications involving very large integers.

  2. Provide relevant information such as what you've already done---or do you expect someone to completely re-invent an indexing scheme neither knowing what queries you like to run, nor what you have already managed to get right youself?

Good luck!