Since I like replacing aggregate functions by old-fashioned self-joins and NOT EXISTS clauses, here is my attempt:
SET search_path='tmp';
DROP TABLE tmp.changes CASCADE;
CREATE TABLE tmp.changes
( id integer NOT NULL PRIMARY KEY
, fullname varchar
, issuer varchar
, rsymbol varchar
, industry varchar
, activity INTEGER NOT NULL
, shareschange FLOAT
, sharespchange FLOAT
, mfiled FLOAT
);
-- lacking information from the OP
-- I can only presume a flat distribution.
INSERT INTO tmp.changes(id, activity, shareschange,sharespchange,mfiled )
SELECT nm.*
, (random() *20)::integer -- mfiled
, random() *10000
, random() *100
, random() *100000
FROM generate_series(1,1000000) nm
;
ALTER TABLE tmp.changes
ALTER shareschange
SET STATISTICS 1000
;
ALTER TABLE tmp.changes
ALTER mfiled
SET STATISTICS 1000
;
VACUUM ANALYZE tmp.changes
;
CREATE INDEX changes_mfiled_shareschange
ON tmp.changes(mfiled,shareschange)
;
EXPLAIN ANALYZE
SELECT initcap(ch.fullname) AS some_name1
, initcap(ch.issuer) AS some_name2
, upper(ch.rsymbol) AS some_name3
, initcap(ch.industry) AS some_name4
, ch.activity
, to_char(ch.shareschange,'FM9,999,999,999,999,999') AS some_name5
, ch.sharespchange || '%' AS some_name6
FROM changes ch
WHERE ch.activity IN (4,5)
-- NOTE: the subquery is *not* correlated.
-- [I had expected a subselect of nx.activity IN (4,5)
-- like in the main query. ]
AND NOT EXISTS (SELECT * FROM changes nx
WHERE nx.mfiled > ch.mfiled
)
ORDER BY ch.shareschange ASC
LIMIT 15
;
Keeping things cached with PostgreSQL is a balance between what you can have in shared_buffers
and what you have in the filesystem cache, and avoiding both of them being pushed out by other processes.
One of the first things you can do on PostgreSQL 9.4 is set huge_pages=on
in your postgresql.conf
.
This also involves doing some OS changes to allow PostgreSQL to start with this enabled. The appropriate sysctl variable is vm.nr_hugepages
You can do a quick and dirty calculation of what you should set vm.nr_hugepages
to with the following bit of shell scripting.
echo $(grep ^VmPeak /proc/$(cat /var/run/postgresql/9.4-main.pid)/status | sed -e 's/VmPeak://' | sed -e 's/kB//' | sed -e 's/[ \t]//g')/$(grep Hugepagesize /proc/meminfo | sed -e 's/Hugepagesize://' | sed -e 's/kB//' | sed -e 's/[ \t]//g') | bc -l | awk '{print int($1+1)}'
This line grabs the VmPeak from the running PostgreSQL process in /proc/$pid/status
, which tells you the peak virtual memory size for the PostgreSQL process. This number is in kB. It cleans up the spacing and extra output, then gets the measurement of Hugepagesize
from /proc/memstat
(which in my case was 2048 kB), then does the appropriate division, rounds the output to the nearest integer, then adds one for a safety margin.
In my case, it was 2147.
Make it permanent by doing the following:
echo "vm.nr_hugepages=2147" >>/etc/sysctl.d/postgresql.conf
This is important for the following reasons, excerpted from the documentation:
/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge
pages in the kernel's huge page pool. "Persistent" huge pages will be
returned to the huge page pool when freed by a task. A user with root
privileges can dynamically allocate more or free some persistent huge pages
by increasing or decreasing the value of 'nr_hugepages'.
Pages that are used as huge pages are reserved inside the kernel and cannot
be used for other purposes. Huge pages cannot be swapped out under
memory pressure.
So, pages grabbed by the postgres process for shared_buffers won't be swapped out under memory pressure. And you'll have less stress on the TLB for your processor too.
Following that, we adjust the OOMKiller, because something randomly murdering postgres processes is not a good citizen. These settings strongly encourage it to not fire.
echo "vm.overcommit_memory=2" >>/etc/sysctl.d/postgresql.conf
echo "vm.overcommit_ratio=100" >>/etc/sysctl.d/postgresql.conf
Next we set a couple of kernel scheduler sysctls:
echo "kernel.sched_autogroup_enabled=0" >>/etc/sysctl.d/postgresql.conf
echo "kernel.sched_migration_cost=5000000" >>/etc/sysctl.d/postgresql.conf
The migration cost is the total time the scheduler will consider a migrated process "cache hot" and thus less likely to be re-migrated.
Autogroup enabled basically groups tasks by setsid() so perceived responsiveness is improved. But on server systems, large daemons like PostgreSQL are going to be launched from the same setsid(), and be effectively choked out of CPU cycles in favor of less important tasks.
After that, we can adjust the swappiness, which is a knob that controls the degree to which a system swaps processes out of physical memory. A higher number more aggressively swaps out, and a lower number keeps things in the filesystem cache longer. Setting this to zero will increase the chance of OOM killing under memory and I/O pressure.
echo "vm.swappiness=10" >>/etc/sysctl.d/postgresql.conf
These control how often dirty data is synced back to disk from the filesystem cache. Since dirty objects in the pagecache can't be freed, more aggressively writing back can keep more pages available for use, and help avoid large bursts of disk IO.
echo "vm.dirty_ratio=2" >>/etc/sysctl.d/postgresql.conf
echo "vm.dirty_background_ratio=1" >>/etc/sysctl.d/postgresql.conf
Since your PostgreSQL version output shows GCC 4.9, I'm assuming that you're running Debian Jessie or a derivative, so the 3.16 kernel will avoid some truly dreadful I/O issues that the 3.2 kernel had, so that's most certainly good news.
Once you've adjusted all these settings, reboot the system. This will help make sure that the hugepages aren't fragmented when allocated. The best and most reliable way though is to put hugepages=2147
(the number I needed on my server), in /etc/default/grub
under GRUB_CMDLINE_LINUX
, and run update-grub
, then reboot. But I would test the necessity of this, as it is much harder to change than the sysctl.
These should give you a reasonable amount of pages in the filesystem cache and in shared_buffers
. After this, making sure the fewest number of processes are running on your server (because they all compete for resources), should give you much of the page cache miss minimization you're looking for.
Further reading on huge pages and their effects are in the hugetlbpage documentation.
The vm settings and their effects are in the vm documentation.
This is a very approachable treatment of the Linux Memory system Understanding and Optimizing Memory Utilization
The PostgreSQL documentation goes into detail on the effects of huge pages at the bottom of the kernel resources page
Hope that helps. =)
Best Answer
The difference is shared hits.
Shared hit means it read the blocks from ram (shared buffers).
Check for shared_buffers parameter on both servers.