MySQL Database Design – Stock Order/Trade System

mariadbMySQL

I've spent a lot of time trying to make a trading system that performs well with MariaDB, but I can't figure it out.

What I'm trying to do is to replicate an order book system like the one often used for stock trading. You can add a buy or sell order at a given price, and if there is anything at that price or better a trade is executed.

Example:

  1. User 1 publishes a sell order for 10 apples at $10/ea.
  2. User 2 publishes a sell order for 5 apples at $8/ea.
  3. User 3 publishes a sell order for 5 apples at $10/ea.
  4. User 4 publishes a buy order for 5 apples at $8/ea. Trade is
    completed.
  5. User 5 publishes a buy order for 13 apples at $10/ea. Trade is
    completed.
  6. User 3 has the only published order left that sells 2 apples at $10/ea.

What I've tried so far is this table:

+------------+------------------------+------+-----+---------+----------------+
| Field      | Type                   | Null | Key | Default | Extra          |
+------------+------------------------+------+-----+---------+----------------+
| id         | int(11) unsigned       | NO   | PRI | NULL    | auto_increment |
| user_id    | int(11) unsigned       | NO   | MUL | NULL    |                |
| ticker_id  | int(11) unsigned       | NO   | MUL | NULL    |                |
| count      | int(11)                | NO   | MUL | NULL    |                |
| price      | decimal(19,4) unsigned | NO   | MUL | NULL    |                |
| created_at | datetime               | NO   |     | NULL    |                |
| updated_at | datetime               | NO   | MUL | NULL    |                |
+------------+------------------------+------+-----+---------+----------------+

Count is a negative number when it's a buy order, and a positive when it's a sell order.

This is the query I'm using when selling to match with buyers(greater/less and price ordering is flipped for matching with sellers):

SELECT `id`, ABS(`count`), `price` FROM trade_orders 
WHERE `ticker_id` = ? AND `count` < 0 AND `price` >= ? 
ORDER BY `price` DESC, `updated_at` ASC FOR UPDATE

Results are looped over. If there are buy orders that pays enough, then one of these queries are executed depending on the order sizes:

DELETE FROM trade_orders WHERE `id` = ? AND `count` = ? LIMIT 1

UPDATE trade_orders SET `count` = `count` + ?
WHERE `id` = ? AND `count` < ? LIMIT 1

Then, the remainder of the order is published:

INSERT INTO trade_orders (user_id, ticker_id, count, price, created_at, updated_at) 
VALUE (?, ?, ?, ?, NOW(), NOW())

This is wrapped up in a transaction that is rolled back in case of a deadlock.

The performance drops off really fast and goes from ~600 trades/sec using 2 threads to ~150 trades/sec when the amount of rows with a given ticker_id reaches ~15K. By 100K rows the performance is way below 10 trades/sec. It's the select query that is slow.

With 22K rows for a ticker_id, this is what ANALYZE shows:

mysql> ANALYZE     SELECT `id`, ABS(`count`), `price` FROM trade_orders     WHERE `ticker_id` = 1 AND `count` < 0 AND `price` >= 500     ORDER BY `price` DESC, `updated_at` ASC FOR UPDATE;
+------+-------------+--------------+------+-----------------------+-----------+---------+-------+-------+--------+----------+------------+-----------------------------+
| id   | select_type | table        | type | possible_keys         | key       | key_len | ref   | rows  | r_rows | filtered | r_filtered | Extra                       |
+------+-------------+--------------+------+-----------------------+-----------+---------+-------+-------+--------+----------+------------+-----------------------------+
|    1 | SIMPLE      | trade_orders | ref  | ticker_id,count,price | ticker_id | 4       | const | 10918 |   0.00 |   100.00 |     100.00 | Using where; Using filesort |
+------+-------------+--------------+------+-----------------------+-----------+---------+-------+-------+--------+----------+------------+-----------------------------+
1 row in set (0.03 sec)

Profiling shows that creation of sort index is the slow part:

SHOW PROFILE FOR QUERY 1;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| Starting                       | 0.000085 |
| Waiting for query cache lock   | 0.000004 |
| Init                           | 0.000004 |
| Checking query cache for query | 0.000067 |
| Checking permissions           | 0.000008 |
| Opening tables                 | 0.000015 |
| After opening tables           | 0.000026 |
| System lock                    | 0.000008 |
| Table lock                     | 0.000006 |
| Init                           | 0.000022 |
| Optimizing                     | 0.000013 |
| Statistics                     | 0.000058 |
| Preparing                      | 0.000016 |
| Sorting result                 | 0.000007 |
| Executing                      | 0.000004 |
| Sending data                   | 0.000006 |
| Creating sort index            | 0.023916 |
| End of update loop             | 0.000024 |
| Query end                      | 0.000005 |
| Commit                         | 0.000032 |
| Closing tables                 | 0.000005 |
| Unlocking tables               | 0.000002 |
| Closing tables                 | 0.000005 |
| Starting cleanup               | 0.000002 |
| Freeing items                  | 0.000005 |
| Updating status                | 0.000043 |
| Reset for next command         | 0.000002 |
+--------------------------------+----------+
27 rows in set (0.00 sec)

Does anyone know how this should be done/designed for good performance even if there is 100K rows for a given ticker_id? I've been searching, but there is little to find on something like this. Never been in a situation like this where I need to ensure that there is never a buy row with a higher price than a sell row, or vice versa.

I would really appreciate if someone could help me out. If not a solution, then at least some pointers.

Best Answer

(I think you will find that Profiling is useless.)

Replace your index on ticker_id with these two composite indexes:

 INDEX(ticker_id, cost)
 INDEX(ticker_id, price)

The Optimizer will pick one of those (based on statistics); either will be more selective than what you have now.

WHERE `ticker_id` = ? AND `count` < 0 AND `price` >= ?

has two ranges. This makes optimization difficult. I see two workarounds.

Plan A: Move the buy/sell flag to price:

WHERE `ticker_id` = ? AND `price` <= ?

INDEX(ticker_id, price)

Plan B: Move the flag into a separate column:

buy_sell ENUM('buy','sell') NOT NULL

WHERE `ticker_id` = ? AND buy_sell ? AND `price` >= ?

INDEX(ticker_id, buy_sell, price)

(I prefer Plan A, but can't decide whether it performs better.)

You have two queries, correct? With the inequalities and ordering flipped? If it is possible make both queries each order the same, there may be an even better approach. That is

ORDER BY price DESC, updated_at DESC  -- for one case
ORDER BY price ASC , updated_at ASC   -- for the other

Then, one of these may better optimize both cases:

INDEX(ticker_id, price, updated_at)
INDEX(ticker_id, buy_sell, price, updated_at)

(depending on whether buy_sell is still needed.)