The following is a really good reference: http://www.ibm.com/developerworks/linux/library/l-dynamic-libraries/. It contains a bibliography at the end of a variety of different references at different levels. If you want to know every gory detail you can go straight to the source: http://www.akkadia.org/drepper/dsohowto.pdf. (Ulrich Drepper wrote the Linux dynamic linker.)
You can get a really good overview of all the sections in your executable by running a command like "objdump -h myexe" or "readelf -S myexe".
The .interp section contains the name of the dynamic loader that will be used to dynamically link the symbols in this object.
The .dynamic section is a distillation of the program header that is formatted to be easy for the dynamic loader to read. (So it has pointers to all the other sections.)
The .got (Global Offset Table) and .plt (Procedure Linkage Table) are the two main structures that are manipulated by the dynamic linker. The .got is an indirection table for variables and the .plt is an indirection table for functions. Each executable or library (which are called "shared objects") has its own .got and .plt and these are tables of the symbols referenced by that shared object that are actually contained in some other shared object.
The .dynsyn contains all the information about the symbols in your shared object (both the ones you define and the external ones you need to reference.) The .dynsyn doesn't contain the actual symbol names. Those are contained in .dynstr and .dynsyn has pointers into .dynstr. .gnu.hash is a hash table used for quick lookup of symbols by name. It also contains only pointers (pointers into .dynstr, and pointers used for making bucket chains.)
When your shared object dereferences some symbol "foo" the dynamic linker has to go look up "foo" in all the dynamic objects you are linked against to figure out which one contains the "foo" you are looking for (and then what the relative address of "foo" is inside that shared object.) The dynamic linker does this by searching the .gnu.hash section of all the linked shared objects (or the .hash section for old shared objects that don't have a .gnu.hash section.) Once it finds the correct address in the linked shared object it puts it in the .got or .plt of your shared object.
I believe that the really short answer is that Linux compilers arrange code into pieces, at least one of which is just pure code, and can therefore be memory mapped into more than one process' address space. Any globals get mapped such that each process gets its own copy.
You can see this using readelf
, or objdump
, but readelf
gives a clearer picture, I think.
Here's a piece of output from readelf -e /usr/lib/libc.so.6
. That's the C library, probably mapped into almost every process. The relevant part of readelf
output (although all of it is interesting) is the Program Headers:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00000034 0x00000034 0x00140 0x00140 R E 0x4
INTERP 0x164668 0x00164668 0x00164668 0x00017 0x00017 R 0x4
[Requesting program interpreter: /usr/lib/ld-linux.so.2]
LOAD 0x000000 0x00000000 0x00000000 0x1adfc4 0x1adfc4 R E 0x1000
LOAD 0x1ae220 0x001af220 0x001af220 0x02c94 0x057c4 RW 0x1000
DYNAMIC 0x1afd90 0x001b0d90 0x001b0d90 0x000f8 0x000f8 RW 0x4
NOTE 0x000174 0x00000174 0x00000174 0x00044 0x00044 R 0x4
TLS 0x1ae220 0x001af220 0x001af220 0x00008 0x00048 R 0x4
GNU_EH_FRAME 0x164680 0x00164680 0x00164680 0x06124 0x06124 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x1ae220 0x001af220 0x001af220 0x01de0 0x01de0 R 0x1
The two LOAD lines are the only pieces of the file that get mapped directly into memory. The first LOAD header maps a piece of /usr/lib/libc.so.6
into memory with R and E permissions: read and execute. That's the code. Hardware features keep a program from writing to that piece of memory, so all programs can share the same pages of real, physical memory. The kernel can set up the hardware to map the same physical memory into all processes.
The second LOAD header is marked RW - read and write. This is the part with global variables that the C library uses. Each process gets its own copy in physical memory, mapped into that process' address space with the hardware permissions set to allow reading and writing. That section is not shared.
You can see these memory mappings in a running process using the /proc
file system. A good command to illustrate: cat /proc/self/maps
. That lists all the memory mappings that the cat
process has, and from what files the kernel got them.
As far as how much you have to do to ensure that your function is allocated to memory that gets mapped into different processes, it's pretty much all down to the flags you give to the compiler. Code intended for ".so" shared libraries is compiled "position independent". Position independent code does things like refer to memory locations of variables with offsets relative to the current instruction, and jumps or branches to locations relative to the current instruction, rather than loading from or writing to absolute addresses, and jumping to absolute addresses. That means the "RE" LOAD piece of /usr/lib/libc.so
and the "RW" piece only have to have be loaded at addresses that are the same distance apart in each process. In your example code, the static
variable will always be at least a multiple of a page size apart from the code that references it, and it will always get loaded that distance apart in a process' address space due to the way the LOAD ELF headers are given.
Just a note about the term "shared memory": There's a user-level shared memory system, associate with "System V interprocess communications system". That's a way for more than one process to very explicitly share a piece of memory. It's fairly complicated and obscure to set up and get correct. The shared memory that we're talking about here is more-or-less completely invisible to any user process. Your example code won't know the difference if it's running as position independent code shared between multiple processes, or if it's the only copy ever.
Best Answer
The answer is "Other". You can get a glimpse of the memory layout with
cat /proc/self/maps
. On my 64-bit Arch laptop::You can see that the executable gets loaded in low memory, apparently .text segment, read-only data, and .bss. Just about that is "heap". In much higher memory the C library and the "ELF file interpreter", "ld-so" get loaded. Then comes the stack. There's only one stack and one heap for any given address space, no matter how many shared libraries get loaded.
cat
only seems to get the C library loaded.Doing
cat /proc/$$/maps
will get you the memory mappings of the shell from which you invokedcat
. Any shell is going to have a number of dynamically loaded libraries, butzsh
andbash
will load in a large number. You'll see that there's just one "[heap]", and one "[stack]".If you call
dlopen()
, the shared object file will get mapped in the address space at a higher address than/usr/lib/libc-2.21.so
. There's something of an "implementation dependent" memory mapping segment, where all addresses returned bymmap()
show up. See Anatomy of a Program in Memory for a nice graphic.The source for
/usr/lib/ld-2.21.so
is a bit tricky, but it shares a good deal of its internals withdlopen()
.dlopen()
isn't a second class citizen."vdso" and "vsyscall" are somewhat mysterious, but this Stackoverflow question has a good explanation, as does Wikipedia.