Solaris 10: Virtual Memory Exhausted

out of memorysolarisvirtual-memory

Our group is all programmers and exclusively use Linux or MacOS, but a customer uses Solaris 10, and we need our code to work there. So we scrounged up an old SunFire V240, and a rented Solaris 10 VM to test on.

The code compiles just fine on the VM, but on the SunFire it fails. Our code has a giant autogenerated C++ file as part of the build. It's this huge file that fails to compile. It fails with the message: virtual memory exhausted: Not enough space

I can't figure it out. The SunFire has 8GBs of RAM, and the virtual memory exhaustion happens when the compile reaches just over 1.2GB. Nothing else significant is running. Here are some memory stats near failure:

Using prstat -s size:

SIZE (virtual memory): 1245 MB
RSS  (real memory):    1200 MB

According to echo "::memstat" | mdb -k, lots of memory is still free:

Free (cachelist) is 46%
Free (freelist)  is 26% of total.

All user processes are using about 17% of RAM just before the compile fails. (After the failure, user RAM usage goes down to 2%.) Which is agrees with the other RAM usage numbers. (1.2GB /8.0GB ~= 15%)

swap -l reports that the swap is completely unused.

Some other details:

We're building with g++ 6.1.0, compiled for 64 bit. It fails if we pass the -m64 flag to the compiler or not.

# uname -a 
SunOS servername 5.10 Generic_147440-27 sun4u sparc SUNW,Sun-Fire-V240

Both the VM and the SunFire have system limits set like this:

>ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
open files                      (-n) 256
pipe size            (512 bytes, -p) 10
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 29995
virtual memory          (kbytes, -v) unlimited

(using su)>rctladm -l
process.max-address-space   syslog=off     [ lowerable deny no-signal bytes ]
process.max-file-descriptor syslog=off     [ lowerable deny count ]
process.max-core-size       syslog=off     [ lowerable deny no-signal bytes ]
process.max-stack-size      syslog=off     [ lowerable deny no-signal bytes ]
process.max-data-size       syslog=off     [ lowerable deny no-signal bytes ]
process.max-file-size       syslog=off     [ lowerable deny file-size bytes ]
process.max-cpu-time        syslog=off     [ lowerable no-deny cpu-time inf seconds ]

We've tried setting the stack size to "unlimited" but that doesn't make any identifiable difference.

# df
/                  (/dev/dsk/c1t0d0s0 ):86262876 blocks  7819495 files
/devices           (/devices          ):       0 blocks        0 files
/system/contract   (ctfs              ):       0 blocks 2147483608 files
/proc              (proc              ):       0 blocks    29937 files
/etc/mnttab        (mnttab            ):       0 blocks        0 files
/etc/svc/volatile  (swap              ):14661104 blocks  1180179 files
/system/object     (objfs             ):       0 blocks 2147483465 files
/etc/dfs/sharetab  (sharefs           ):       0 blocks 2147483646 files
/platform/sun4u-us3/lib/ blocks  7819495 files
/platform/sun4u-us3/lib/sparcv9/ blocks  7819495 files
/dev/fd            (fd                ):       0 blocks        0 files
/tmp               (swap              ):14661104 blocks  1180179 files
/var/run           (swap              ):14661104 blocks  1180179 files
/home              (/dev/dsk/c1t1d0s0 ):110125666 blocks  8388083 files

Edit 1: swap output after setting up 16GB swap file:

Note: block size is 512

# swap -l
swapfile            dev    swaplo   blocks    free
/dev/dsk/c1t0d0s1   32,25  16       2106416   2106416  
/home/me/tmp/swapfile -    16       32964592  32964592

# swap -s
total: 172096k bytes allocated + 52576k reserved = 224672k used, 23875344k available 

Best Answer

I have a few things for you to try:

  1. I think @AndrewHenle is correct that you need more swap space.
  2. You could try fiddling with GCC's template & constexpr recursion depth limits (-ftemplate-depth and -fconstexpr-depth). At best, though, I expect it will only help you see what expression(s) are causing it to run out of memory.
  3. Some debugging tricks (more on that below)

This article details how to increase swap space in Solaris, however the day will come when that link is broken, so here's a synopsis taken from that article:

# Identify the current swap volume.
$ swap -l
swapfile                 dev  swaplo   blocks   free
/dev/zvol/dsk/rpool/swap 256,1      16 1058800 1058800
# Do one of the following:
# a) Modify the existing swap volume (REQUIRES REBOOT)
    $ zfs get volsize rpool/swap
    rpool/swap  volsize   517M     -

    $ zfs set volsize=2g rpool/swap

    $ zfs get volsize rpool/swap
    rpool/swap  volsize   2G       -

    $ init 6
# b) Add an additional swap volume
    # Create it
    $ zfs create -V 2G rpool/swap2

    # Activate it
    $ swap -a /dev/zvol/dsk/rpool/swap2

    $ swap -l
    swapfile                  dev  swaplo   blocks   free
    /dev/zvol/dsk/rpool/swap  256,1      16 1058800 1058800
    /dev/zvol/dsk/rpool/swap2 256,3      16 4194288 4194288

    # Add an entry for the new volume in /etc/vfstab
    $ /opt/csw/gnu/grep -P '\sswap' /etc/vfstab
    /dev/zvol/dsk/rpool/swap  - - swap - no -
    /dev/zvol/dsk/rpool/swap2 - - swap - no -

If you want to try diagnosing the problem, here's some things to try:

# This tells Solaris to add all available sections into coredumps &
# place coredumps in your home directory with the given pattern
$ coreadm -p ~/%t.%n.%u.%f.%p.core -P all

$ gcc ${flags} -fsyntax-only
$ gcc ${flags} -c -o source.o

Things to look for / try:

  • Does it crash with -fsyntax-only?
  • If both crash, are the core dumps it generates roughly the same size?
  • The coredump size should be indicative of how much memory the process acquired before crashing. Compare that with system limits:
    • In top the Solaris machine I'm looking at shows 511G physical memory, 153G free, 20G total swap, 20G free swap.
    • Run swap -l & compare
  • After altering swap size does behavior change in any noticeable ways?
    • Crashes sooner / takes longer to crash
    • Core size is different
  • Try intentionally using a small swap size (or get rid of the swap partition altogether) to see whether it crashes sooner, produces a smaller core dump or in any other way alters behavior.
  • Put in a big hard drive and use the entire drive for swap.

Additionally, look at some of GCCs flags, eg:

  • -Q Print function names as they're compiled & stats about each pass
  • -ftime-report Timing information for each pass
  • Various -fdump-rtl* flags
  • Try getting output at different stages (preprocessor, assembly, etc) to see if you get different behavior
Related Question