Postgresql – Many queries are slower on new hardware and newest version of PostgreSQL

optimizationperformancepostgresqlpostgresql-9.6query-performance

Note: I have significantly re-worked this question based on the direction the commentators have taken me and what I have learned on the way.

I am setting up a new development database server. The new one is a 6th Gen i7-6700 with 16GB DDR4 memory and an Intel 750 SSD with PostgreSQL 9.6. The old one is a 4th gen i5-4570 with 16GB DDR3 memory and a Crucial SSD Raid 0 array with PostgreSQL 9.2. I benchmarked the 750 and it is much faster than the RAID. However, the read performance below 256KB is slower.

I used pg_dump and pg_restore to move the data and I ran vacuum analyze on all databases.

The new computer should be quite a bit faster than the old one and in some types of queries is it, but in others it is several times slower. I also installed 9.2 on the new computer to try and isolate the database versions from the hardware. I kept the postgresql.conf identical where possible (9.6 changes some wal settings.)

I set up pg_stat_statements to get some statistics to narrow down the problem. I have 500+ schemas and 3000+ tables. The transformations I am running are creating new tables from some of these tables and updating those new tables from these 3000 other tables. I am bouncing all over this 50GB of data.

On the new server and 9.6 create table, copy, vacuum, analyze, drop table, updates that affect many rows, create index, create table as are usually faster, up to twice as fast. But occasionally they will be several times slower. The new server running 9.2 sees a similar pattern, but the performance increases are not nearly as pronounced. Interestingly, when 9.6 is slower on some particular copy or vacuum, 9.2 will usually be similarly slow.

drop index, drop table, select queries that return few rows, select count(*), update queries that affect a single row are all usually 50%-300% slower in the new 9.6 server. 9.2 on the new server is also slower, but usually only 20%-30%. I do not know how PostgreSQL accesses the disk, but I suspect that the Intel 750's slower read performance on smaller files is having an effect as seen in the slower 9.2 speed. But that does not explain the 5x-10x performance degradation between 9.2 and 9.6 on the same server.

So my conclusion is 9.6 is significantly slower for certain types of queries.

I have run explain on many of the queries in question and the plans are identical.

Here are some representative examples (from pg_stat_statements) of the queries that are slow. Note that they are all primary keyed, indexed vacuumed and analyzed and otherwise identical between the two databases. Also, I only selected queries that have at least 50 calls.

Update finalal Set jsondone=?, json=? Where id=? //20% slower
SELECT * FROM archive_natrilis_2015_q3.natrilis_releases WHERE siteid = ? ORDER BY reporting_year ASC LIMIT ? // 404% slower
SELECT * FROM work_narcra00_2016_q3.narcra_handler WHERE siteid = ? ORDER BY handler_sequence_number ASC LIMIT ? // 182% slower
select * from listtypedesc // small 20 row table, 173% slower.
select * from INFORMATION_SCHEMA.schemata where schema_name not like ? and schema_name not like ? AND schema_name like ? Order by schema_name Desc // 52% slower
Select count(*) From work_allustis_2016_q3.allustis_details Where siteid = ? // 234% slower
SELECT * FROM archive_naerns14_2015_q2.naerns14_material_involved WHERE siteid = ?  LIMIT ? // 287% slower

Is my conclusion correct? And if so is there a 9.6 specific way to optimize these slow queries? If not, are there other debug actions to recommend?

Best Answer

You have changed too many things at once. First run 9.2 on your new hardware and check the performance. Then upgrade to 9.6 once you get that sorted out.

How did you move the data? Logically, by using pg_dump and then a restore, or physically by using pg_basebackup (or a cold copy) and then pg_upgrade?

Update finalal Set jsondone=1 Where id=1

The old server runs this at an average of 0.2 ms over 72 iterations. The new server is 1.43ms.

72 iterations of the same id=? value, or consecutive ones, or random ones? And all in one transaction, or each a separate one?

The new server is showing half the cost (but more startup costs for some reason) so I assume it would be faster, especially on faster hardware.

Those costs are not general time estimates, they are internal accounting used to make reasonable planner choices. Things for which there are no choices (there is only way to update a row) are not costed at all, but obviously they take actual time to do. And even with that in mind, you have to use great caution comparing them across versions. They just aren't meant for that purpose.

I did notice that shared_blks_read and shared_blks_dirtied were about twice as high on the new server. shared_blks_hit were only slightly higher. Maybe a clue in that , but I do not really know that this means.

I think what this means is that your data in the new server is packed more tightly than it was in the old server, probably because you used pg_dump to get the data over rather than a physical copy. The default fillfactor for tables is 100%, so the rows are packed into a page as many as will fit.

So on the old server when you do an update, there is room to put the new version of the updated row in the same page as the old version. On the new server, there is no room on the page, so it has to seek out a different page to put the new version of the row on (and then both pages get dirtied). Which also means it has to update the indexes, so they know where to find the new version.