PostgreSQL: Save roundtrips / Network performance

Networkpostgresql

For my work I often have to to large selects loading a lot of data. The connection happens with JDBC driver and high fetch size. This is not a problem when working directly on the server of the database. However when connecting from a different machine, the load times get terrible and it takes >10x times longer to load the data.

Note that the amount of data is actually small (100s of MB, 200k rows) and the network bandwidth can't be the limiting factor. Therefore it's almost certainly latency and hence round trips that cause the bad performance.

Latency issue seem confirmed by checking with wireshark and I see a huge back and forth between the client and server. (note both windows). Similar what was observed in this old topic here.

Are there any server or client side (JDBC) settings that would reduce roundtrips and lower the latency costs?

EDIT:

On same machine:
Remote: 20 min to load data (ping ~120ms)
local: 50 seconds (ping < 1ms)

where local means intranet and db server is in a building about 300 feet away. And on the actual server it's also 50 seconds. Explain analyze says 105ms.

I do understand the issue I simply wasn't aware the impact is that big. I also wonder if it's a windows issue as it seems there are far too many ack after very few data when connecting from remote. I should probably ask about that in a network forum.

Best Answer

there is nothing on PostgreSQL side to deal with this.

JDBC side i know nothing about but quick google search points to no

A solution i use when working with large data sets is to fetch only a few records at a time 1000 to 10000.

Select * from tablename where tablename.id > 1 ordered by id limit 1000

next set of queries

Select * from tablename where tablename.id > 1000 ordered by id limit 1000

Some people call this asynchronous fetching/paging.

Some background Most db connections run in synchronize mode by default, "command sent" client waits until all data is delivered. This makes the app appear to hang for long periods of time, this makes the users think the app has crashed or something else has gone wrong.

Using asynchronous technique the data is continuously fetch and delivered using a background process/thread while the rest of the code is free to continue working giving the user the appearance the application is very snappy.

Its allot harder to write code around fetching only a few records at a time but can be worth it on big data sets, as the app does not have to wait on all the data

There are several tools to make this easier, such as creating cursor on the server and fetch only X records from the server. But there are draw backs to cursors, eating server resources, required to run clean up code.

Some client libraries makes this easy by directly supporting cursor other do not support this feature JDBC does not appear to support this so it will have to be written.