Mysql – Performance considerations for using MySQL table as a queue

MySQL

I'm working on a system which reports certain events to an external service.
Whenever a relevant change is made in the database, we want to guarantee that at least one message is sent to the external service.

I'm thinking this could be implemented with a message_queue table as follows:

BEGIN a transaction
Make the relevant database change
Insert a message about the change into the message_queue table, including a UUID
COMMIT the transaction
Later (eg on a cron job), iterate through the message_queue table, attempting to send each message to the external service. Whenever we get a successful response, delete that row. When we don't, try again later. If the external service actually received that message the first time but the reply was lost, it can discard it based on the duplicate UUID.

Assuming we can send and delete message_queue rows faster than they are created, this table would tend to have very few rows at any moment.

Given this usage pattern of frequent inserts and deletes, I think we would not want to index the table.
Is that correct? What else could we do to minimize the impact on overall database performance?

Best Answer

My first encounter with "using MySQL as a queue" ended in disaster. The team was very wedded to tossing things into a table, then pulling it out to work on. They were limited to how many things the could achieve per hour.

After studying not only the queuing mechanism but the enqueue and dequeue API and the worker threads, I decided on:

"Don't queue it, just do it".

I estimated (for their case) that they could increase the throughput 10-fold by removing the queue.

Here are some random lessons:

If the task to perform is fast enough, you may be spending more time on enqueuing/dequeuing than the task.
If the tasks arrive in a steady stream, the "buffering" that queuing gives you is unnecessary. On the other hand, if the tasks are "bursty", queuing may be beneficial.
Replication (HA was a requirement in the case) significantly complicates the queuing code.
Using AUTO_INCREMENT for queuing with replication adds a complication because the ids don't always arrive in the Slave in order! This happens randomly (but not very frequently) with InnoDB.
Deletion of queue items leads to fragmentation. This was especially bad with MyISAM.
Note how switching Engines solves one problem but creates another?
UUIDs are terrible for performance if the table gets bigger than can be cached.
Transactional integrity gets messy -- the item in the queue is mostly independent of the task it represents. Issues to resolve: requeuing a task that fails; crashing in the middle of a task; etc.
Sure, the queue will have very few rows at any normal moment. But there will come a time when something hiccups and there is a million tasks queued up. The system will croak for any of several reasons that you failed to plan for. You have a crisis on your hands and the looming job of analyzing and planning for such eventualities.
Based on my previous comment, do you really want to leave off any index? No. The dequeuing mechanism will get slower and slower, further exasperating the crisis.
If a queued task is assigned to a worker, but that worker crashes, then the item needs to stick around somewhere (in the queue or some other place) and some separate task needs to eventually discover an unfinished task after some timeout. More messy code to write.

An alternative (that may or may not apply to your situation): Leave information around and have a continuously running job (not a cron) that looks for the info and acts on it. This can be handy if the items arrive rapidly and they can be processed in batches.

Related Solutions

Mysql – Preventing thesql deadlocks in your php application that uses SELECT… LOCK IN SHARE MODE

OMG I spent hours thinking through this issue.

I answered three very tough questions addressing this exact issue.

SELECT queries can perform locks on the gen_clust_index, aka the Clustered Index.

Here are three DBA Stack Exchanges questions I agressively looked over with @RedBlueThing, the person who asked these questions. @RedBlueThing found work arounds for his questions.

Jun 14, 2011 : Reasons for occasionally slow queries?
Jun 08, 2011 : Will these two queries result in a deadlock if executed in sequence?
Jun 06, 2011 : Trouble deciphering a deadlock in an innodb status log

Please read them carefully so you can relive my episode of philosophical InnoDB, right or wrong.

Mysql – Generating Invoices and Tracking

Cash matching

This is a cash matching problem. You can track this at one of two levels:

Compare invoiced to cash figures (somewhat sloppy but this is actually how it's done for inwards business by most Lloyd's Syndicates, often called a 'written vs. signed' report).
Maintain explicit cash allocations from cash payments broken down by invoice.

From your question I think you want to do the latter.

Typically this is done by having a separate set of cash transactions, and a bridging table that has the allocation of cash payments to invoices. If the values are equal or the cash payment comes with a single invoice reference you can do the allocation automatically. If there's a M:M relationship between invoices and payments you will need to do a manual matching process (doing this automatically is actually a variant of the knapsack problem).

A basic cash matching system

Imagine that you have an invoice table, a cash payments table and an allocation table. When you issue an invoice then you set up an invoice record in the invoices table and a 'receivable' or 'payable' record in the allocations table.

Invoice #1, $100
Allocation: a record with a reference to invoice #1, 'receivable' transaction type and $100 owing. No reference to a cash payment on this record.

Now, you get a cash payment of $100

Cash payments (chq #12345): $100
Allocation: a record with a reference to invoice #1 and chq #12345, 'cash' transaction type and -100 owing ($100 paid).

You can generalise this to a M:M relationship where you get multiple payments against a single invoice or a payment covering multiple invoices. This structure also makes it quite easy to build credit control reports. The report just needs to find invoices older than (say) 180 days that still have outstanding balances.

Here's an example of the schema plus a couple of scenarios and an aged debt query. Unfortunately I don't have a running mysql instance to hand, so this one is for SQL Server.

-- ==============================================================
-- === CashMatch.sql ============================================
-- ==============================================================
--


-- === Invoices =================================================
--
create table Invoice (
       InvoiceID        int identity (1,1) not null
      ,InvoiceRef       varchar (20)
      ,Amount           money
      ,InvoiceDate      datetime
)
go

alter table Invoice
  add constraint PK_Invoice 
      primary key nonclustered (InvoiceID)
go


-- === Cash Payments ============================================
--
create table CashPayment (
       CashPaymentID    int identity (1,1) not null
      ,CashPaymentRef   varchar (20)
      ,Amount           money
      ,PaidDate         datetime
)
go

alter table CashPayment
  add constraint PK_CashPayment
      primary key nonclustered (CashPaymentID)
go




-- === Allocations ==============================================
--
create table Allocation (
       AllocationID       int identity (1,1) not null
      ,CashPaymentID      int  -- Note that some records are not
      ,InvoiceID          int  -- on one side.
      ,AllocatedAmount    money
      ,AllocationType     varchar (20)
      ,TransactionDate    datetime
)
go

alter table Allocation
  add constraint PK_Allocation
      primary key nonclustered (AllocationID)
go


-- ==============================================================
-- === Scenarios ================================================
-- ==============================================================
--
declare @Invoice1ID int
       ,@Invoice2ID int
       ,@PaymentID int


-- === Raise a new invoice ======================================
--
insert Invoice (InvoiceRef, Amount, InvoiceDate)
values ('001', 100, '2012-01-01')

set @Invoice1ID = @@identity

insert Allocation (
       InvoiceID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice1ID, 100, '2012-01-01', 'receivable')


-- === Receive a payment ========================================
--
insert CashPayment (CashPaymentRef, Amount, PaidDate)
values ('12345', 100, getdate())

set @PaymentID = @@identity

insert Allocation (
       InvoiceID
      ,CashPaymentID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice1ID, @PaymentID, -100, getdate(), 'paid')



-- === Raise two invoices =======================================
--
insert Invoice (InvoiceRef, Amount, InvoiceDate)
values ('002', 75, '2012-01-01')

set @Invoice1ID = @@identity

insert Allocation (
       InvoiceID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice1ID, 75, '2012-01-01', 'receivable')


insert Invoice (InvoiceRef, Amount, InvoiceDate)
values ('003', 75, '2012-01-01')

set @Invoice2ID = @@identity

insert Allocation (
       InvoiceID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice2ID, 75, '2012-01-01', 'receivable')


-- === Receive a payment ========================================
-- The payment covers one invoice in full and part of the other.
--
insert CashPayment (CashPaymentRef, Amount, PaidDate)
values ('23456', 120, getdate()) 

set @PaymentID = @@identity

insert Allocation (
       InvoiceID
      ,CashPaymentID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice1ID, @PaymentID, -75, getdate(), 'paid')

insert Allocation (
       InvoiceID
      ,CashPaymentID
      ,AllocatedAmount
      ,TransactionDate
      ,AllocationType
) values (@Invoice2ID, @PaymentID, -45, getdate(), 'paid')



-- === Aged debt report ========================================
--
select i.InvoiceRef
      ,sum (a.AllocatedAmount)                 as Owing
      ,datediff (dd, i.InvoiceDate, getdate()) as Age
  from Invoice i
  join Allocation a
    on a.InvoiceID = i.InvoiceID
 group by i.InvoiceRef
         ,datediff (dd, i.InvoiceDate, getdate())
having sum (a.AllocatedAmount) > 0

Best Answer

Related Solutions

Mysql – Preventing thesql deadlocks in your php application that uses SELECT… LOCK IN SHARE MODE

Mysql – Generating Invoices and Tracking

Related Question