The C-library function free()
can, but does not have to, return memory to the kernel.
Some implementations of malloc()
move the boundary between "heap" and otherwise unused address space (the "system break") via the sbrk()
system call, then dole out smaller pieces of those large allocations. Without getting every smaller piece de-allocated, free()
can't really return the memory to the OS.
That same reason applies to malloc()
implementations that don't use sbrk(2)
, but maybe use mmap("/dev/zero")
or something.. I can't find a reference, but I seem to remember that one or another of the BSD's used mmap()
that way to get pages of memory. Nevertheless, free()
can't return a page to the operating system unless every sub-allocation is deallocated by the program.
Some malloc()
implementations do return memory to the system: ChorusOS(?) apparently did. It's not clear if it moved the system break, or munmap()'ed
pages.
Here's a paper about a memory allocator that improves performance by "aggressively giving up free pages to the virtual memory manager". Slide show for a talk about the allocator.
Since the debug output from the ld
dynamic linker/loader confirms that both the victim
and spy
programs load the correct input file, the next step would be to verify if the kernel has actually set up the physical pages where libmyl.so
is loaded in memory to be shared between the victim
and spy
.
In Linux this is possible to verify since kernel version 2.6.25 via the pagemap
interface in the kernel that allows userspace programs to examine the page tables and related information by reading files in /proc
.
The general procedure for using pagemap to find out if two processes share memory goes like this:
- Read
/proc/<pid>/maps
for both processes to determine which parts of the memory space are mapped to which objects.
- Select the maps you are interested in, in this case the pages to which
libmyl.so
is mapped.
- Open
/proc/<pid>/pagemap
. The pagemap
consists of 64-bit pagemap descriptors, one per page. The mapping between the page's address and it's descriptors address in the pagemap
is page address / page size * descriptor size. Seek to the descriptors of the pages you would like to examine.
- Read a 64-bit descriptor as an unsigned integer for each page from the
pagemap
.
- Compare the page frame number (PFN) in bits 0-54 of the page descriptor between the
libmyl.so
pages for victim
and spy
. If the PFNs match, the two processes are sharing the same physical pages.
The following sample code illustrates how the pagemap
can be accessed and printed from within the process. It uses dl_iterate_phdr()
to determine the virtual address of each shared library loaded into the processes memory space, then looks up and prints the corresponding pagemap
from /proc/<pid>/pagemap
.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <inttypes.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <link.h>
#include <errno.h>
#include <error.h>
#define E_CANNOT_OPEN_PAGEMAP 1
#define E_CANNOT_READ_PAGEMAP 2
typedef struct __attribute__ ((__packed__)) {
union {
uint64_t pmd;
uint64_t page_frame_number : 55;
struct {
uint64_t swap_type: 5;
uint64_t swap_offset: 50;
uint64_t soft_dirty: 1;
uint64_t exclusive: 1;
uint64_t zero: 4;
uint64_t file_page: 1;
uint64_t swapped: 1;
uint64_t present: 1;
};
};
} pmd_t;
static int print_pagemap_for_phdr(struct dl_phdr_info *info,
size_t size, void *data)
{
struct stat statbuf;
size_t pagesize = sysconf(_SC_PAGESIZE);
char pagemap_path[BUFSIZ];
int pagemap;
uint64_t start_addr, end_addr;
if (!strcmp(info->dlpi_name, "")) {
return 0;
}
stat(info->dlpi_name, &statbuf);
start_addr = info->dlpi_addr;
end_addr = (info->dlpi_addr + statbuf.st_size + pagesize) & ~(pagesize-1);
printf("\n%10p-%10p %s\n\n",
(void *)start_addr,
(void *)end_addr,
info->dlpi_name);
snprintf(pagemap_path, sizeof pagemap_path, "/proc/%d/pagemap", getpid());
if ((pagemap = open(pagemap_path, O_RDONLY)) < 0) {
error(E_CANNOT_OPEN_PAGEMAP, errno,
"cannot open pagemap: %s", pagemap_path);
}
printf("%10s %8s %7s %5s %8s %7s %7s\n",
"", "", "soft-", "", "file /", "", "");
printf("%10s %8s %7s %5s %11s %7s %7s\n",
"address", "pfn", "dirty", "excl.",
"shared anon", "swapped", "present");
for (unsigned long i = start_addr; i < end_addr; i += pagesize) {
pmd_t pmd;
if (pread(pagemap, &pmd.pmd, sizeof pmd.pmd, (i / pagesize) * sizeof pmd) != sizeof pmd) {
error(E_CANNOT_READ_PAGEMAP, errno,
"cannot read pagemap: %s", pagemap_path);
}
if (pmd.pmd != 0) {
printf("0x%10" PRIx64 " %06" PRIx64 " %3d %5d %8d %9d %7d\n", i,
(unsigned long)pmd.page_frame_number,
pmd.soft_dirty,
pmd.exclusive,
pmd.file_page,
pmd.swapped,
pmd.present);
}
}
close(pagemap);
return 0;
}
int main()
{
dl_iterate_phdr(print_pagemap_for_phdr, NULL);
exit(EXIT_SUCCESS);
}
The output of the program should look similar to the following:
$ sudo ./a.out
0x7f935408d000-0x7f9354256000 /lib/x86_64-linux-gnu/libc.so.6
soft- file /
address pfn dirty excl. shared anon swapped present
0x7f935408d000 424416 1 0 1 0 1
0x7f935408e000 424417 1 0 1 0 1
0x7f935408f000 422878 1 0 1 0 1
0x7f9354090000 422879 1 0 1 0 1
0x7f9354091000 43e879 1 0 1 0 1
0x7f9354092000 43e87a 1 0 1 0 1
0x7f9354093000 424790 1 0 1 0 1
...
where:
address
is the virtual address of the page
pfn
is the pages page frame number
soft-dirty
indicates the if soft-dirty bit is set in the pages Page Table Entry (PTE).
excl.
indicates if the page is exclusively mapped (i.e. page is only mapped for this process).
file / shared anon
indicates if the page is a file pages or a shared anonymous page.
swapped
indicates if the page is currently swapped (implies present
is zero).
present
indicates if the page is currently present in the processes resident set (implies swapped
is zero).
(Note: I run the example program with sudo
as since Linux 4.0 only users with the CAP_SYS_ADMIN
capability can get PFNs from /proc/<pid>/pagemap
. Starting from Linux 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN
. The reason for this change is to make it more difficult to exploit another memory related vulnerability, the Rowhammer attack, using the information on the virtual-to-physical mapping exposed by the PFNs.)
If you run the example program several times, you should notice that the virtual address of the page should change (due to ASLR), but the PFN for shared libraries that are in use by other processes should stay the same.
If the PFNs for libmyl.so
match between the victim
and spy
program, I would start looking for a reason for why the attack fails in the attack code itself. If the PFNs don't match, the additional bits may give some hint why the pages are not set up to be shared. The pagemap
bits indicate the following:
present file exclusive state:
0 0 0 non-present
1 1 0 file page mapped somewhere else
1 1 1 file page mapped only here
1 0 0 anonymous non-copy-on-write page (shared with parent/child)
1 0 1 anonymous copy-on-write page (or never forked)
Copy-on-write pages in (MAP_FILE | MAP_PRIVATE)
areas are anonymous in this context.
Bonus: To obtain the number of times a page has been mapped, the PFN can be used to look up the page in /proc/kpagecount
. This file contains a 64-bit count of the number of times each page is mapped, indexed by PFN.
Best Answer
You want the
migratepages
binary in thenumactl
package.Usage & Example
Limitations
VM hardware
Pages may be locked to a node, eg. if they related to hardware pass-through and they represent hardware located on a specific node.
Free Memory & Page size
You obviously need enough free memory on the destination node, but it also needs to not too fragmented to move large pages. If one of the pages is a large order contiguous allocation, and the destination node free memory has no free regions large enough, then moving the large page might fail (depending on compaction being triggered & succeeding).