Postgresql – Memory Optimized Azure Postgres

azurepostgresql

I am trying to make sense of the "memory optimized" setting for Azure Database for PostgreSQL. Main memory data management can be viewed as quite different from secondary storage data management (like column stores, and other main memory optimized approaches to store data), hence my confusion. Would Microsoft simply create a ram-drive or simple allocate more RAM for data? We still need to flush all the data from RAM to disk in case of a possible loss of electric power or hardware failure every now and then, how all this is accomplished on such instances?

Best Answer

I don't know much about this particular product - you probably should also ask this question directly to the vendor.

It looks like they'll simply give you a VM with more main memory and configure the database accordingly to use more memory for buffering of data file pages (shared_buffers). They might also configure more memory that can be used by each individual connection (work_mem), for example to sort tuples in memory instead of on disk, or for joins.

When asking the database to return data to you, having the pages already in the database's buffers hugely increase response times. Having more memory available for each query to do sorting and joins is advantageous with "analytical" queries. Nobody in their right mind would think of selling PostgreSQL instances that use RAMdisks. To achieve durability of changes to the database, PostgreSQL does not only apply the changes to the buffers (which will only eventually be written out to the disk), but it also records all changes into a Write-Ahead-Log (WAL), sometimes also called Transaction Log (XLOG). Using the default configuration, a Writing Query is only confirmed to the client as having been written when this WAL has been flushed to the disk. I strongly assume that Azure follows this best practice.

In the end, aside from the amount of memory, and memory related settings, I doubt that there is much difference in the configuration of these machines. The database itself still works in much the same way, there is no general option to turn PostgreSQL into an in-memory database, it is always a database running on secondary storage, using primary storage to speed up most of the operations that one would notice judging by throughput or latency.