Without swap, applications will be killed (rather than being swapped) if you run out of memory; you will also have more slowdown if you disable filesystem caching, as it will need to read the disk more.
As for the swap requirement, you might be able to avoid it (or use a small swap) if you max out the RAM on your machine.
I was pondering over a similar question -- you saw my thread about kswapd and zone watermarks -- and the answer in my case (and probably in yours as well) is memory fragmentation.
When memory is fragmented enough, higher order allocation will fail, and this (depending on a number of additional factors) will either lead to direct reclaim, or will wake kswapd which will attempt to do zone reclaim/compaction. You can find some additional details in my thread.
Another thing that may escape attention when dealing with such problems is memory zoning. I.e. you may have enough memory overall (and it might even contain enough contiguous chunks) but it may be restricted to DMA32 (if you're on 64-bit architecture). Some people tend to ignore DMA32 as "small" (probably because they are used to 32-bit thinking) but 4GB is not really "small".
You have two ways of finding out for sure what's going on in your case. One is analyzing stats -- you can set up jobs to take periodic snapshots of /proc/buddyinfo, /proc/zoneinfo, /proc/vmstat etc., and try to make sense out of what you're seeing.
The other way is more direct and reliable if you get it to work: you need to capture the codepaths that lead to swapout events, and you can do it using tracepoints the kernel is instrumented with (in particular, there are numerous vmscan events).
But getting it to work may be challenging, as low-level instrumentation doesn't always work the way it's supposed to out of the box. In my case, we had to spend some time setting up ftrace infrastructure only to find out in the end that function_graph probe that we needed wasn't working for some reason. The next tool we tried was perf, and it too wasn't successful on the first attempt. But then when you eventually manage to capture events of interest, they are likely to lead you to the answer much faster than any global counters.
Best regards,
Nikolai
Best Answer
No, it’s a bad idea.
You shouldn’t think of swap as a mechanism by which you can expand memory; it’s a storage area for parts of memory which don’t have to remain in physical memory, and whose contents don’t exist anywhere else. See Why does Linux need swap space in a VM? for details.
If the processes running inside your VMs are running out of memory, you need to determine what their real working set is, both in nominal operation and in the worst case. Then, assuming you can’t reduce their memory usage, you need to configure their memory setups to suit: RAM allocation, swap, and kernel configuration (swappiness etc.). The RAM allocation will have a direct impact on the number of VMs you can run per host, and that should really be your main adjustment variable if you can’t add more memory to your hosts. (That doesn’t help with the cost aspect of course...) Depending on what you need VMs for, another strategy could be to use containers instead since that will allow you to reduce the overhead.
Operating systems typically start using swap when they need to allocate memory and they’ve run out of available physical memory, and the least used memory pages currently in physical memory don’t have what’s called a backing store (or rather, their backing store is swap). When a program needs more memory, the kernel will first look for some free memory; then it will look through a hierarchical list of things it can get rid of — cache, buffers, mapped executables, etc. Note that swap can be used even in the absence of “visible” memory pressure: there are always pieces of data stored in memory which aren’t actually used, and are better stored in swap.