NVMe+bcache as huge storage cache for vertica database

ssdstoragevertica

We are in process of selecting hardware for 3 machines vertica cluster capable to serve 20-30TB of data (15TB compressed). This translates to 8TB of post-RAID storage. With 16 CPU cores, the storage should yield 800MB/s of "large block random I/O" throughput.

Cost of the storage becomes pretty high, once we count in drives, RAID overhead and server chassis necessary to hold all the drives. 10k RPM drives are relatively small (SAS 10k RPM tops at 1.2TB max) and we need many of them to provide the total storage required. SSD are four times more expensive with SATA/SAS becoming the bottleneck.

I have been thinking about "hybrid" solution of using super-fast NVMe storage (2GB/s thoughput) as cache for a much more slower HDD backend (couple of 10TB SATA drives in RAID1). This can be accomplished by using Bcache technology of linux kernel.

The assumptions are:

  • most reporting queries to vertica cover only "hot" data area of hour/day/week worth of data. Bcache would cache the frequently used data onto NVMe.
  • as time goes, "hot" data area shifts on the backend disks surface and Bcache automatically adjusts cached blocks to represent it.
  • write queries (data import) of vertica will be much slower than reads and that would not block any read operations (write-through mode). This is ok, as we do data import in bulk and don't need it to be immediate.

Back of envelope calculations show that cost of such system to be in range of $2000 as opposed to $8000 fully SSD-based solution (per server).

Can this approach work for vertica (or any other analytics database)? Have anyone used the "fast cache in front of slow backend" approach for databases?

Best Answer

Very interesting ! 

Not sure how this would work around Tuple Mover internal crunching ? Block change all the time.

my take

I guess a UDT to extract data from Vertica and flush it into memcahe or similar would also be a solution