Linux – Build, link processes in linux

linux

I'm not able to get difference between build and compile. Are they same? how exactly linking works? What exactly .so file and .o file contains and how I am supposed to use them? These all files I see everyday but I don't know exactly what they contain. Can anyone suggest some tutorial to clearly tell these processes?

Best Answer

The term "build" is usually used to mean the whole process that starts off with a set of source code files and other resources, and ends up with a set of executables, shared libraries (and possibly other resources).
This can involve quite a lot of steps like special pre-processors (moc for Qt code for example), code generators (flex/yacc or bison for instance), compilation, linking, and possibly post-processing steps (e.g. building tar.gz or rpm files for distribution).

For C and C++ (and related languages), compilation is the thing that transform source files (say .c files for C code) into object files (.o). These object files contain the machine code generated by the compiler for the corresponding source code, but aren't final products - in particular, external function (and data) references are not resolved. They are "incomplete" in that sense.
Object files are sometimes grouped together into archives (.a files), also called static libraries. This is pretty much just a convenience way of grouping them together.

Linking takes (usually several) object files (.o or .a) and shared libraries, combines the object files, resolves the references (mainly) between the object files themselves and the shared libraries, and produces executables that you can actually use, or shared libraries (.so) that can be used by other programs or shared libraries.

Shared libraries are repositories of code/functions that can be used directly by other executables. The main difference between dynamic linking against a shared library, and (static) linking an object or archive file in directly, is that shared libraries can be updated without rebuilding the executables that use them (there are a lot of restrictions to this though).
For instance, if at some point a bug is found in an OpenSSL shared library, the fix can be made in that code, and updated shared libraries can be produced and shipped. The programs that linked dynamically to that shared library don't need to re-build to get the bug-fix. Updating the shared library automatically fixes all its users.
Had they linked with an object file instead (or statically in general), they would have had to rebuild (or at least re-link) to get the fix.

A practical example: say you want to write a program - a fancy command line calculator - in C, that has command line history/editing support. You'd write the calculator code, but you'd use the readline library for the input handling.
You could split your code in two parts: the math functions (put those functions in mathfuncs.c), and the "main" calculator code that deals with input/output (say in main.c).

Your build would consist in:

  • Compile mathfuncs.c (gcc -o mathfuncs.o -c mathfuncs.c, -c stands for "compile only")
    mathfuncs.o now contains your compiled math functions, but isn't "executable" - it's just a repository of function code.

  • Compile your frontend (gcc -o main.o -c main.c)
    main.o is likewise just a bunch of functions, not runnable

  • Link your calculator executable, linking with readline:

    gcc -o supercalc main.o mathfuncs.o -lreadline
    #      ^ executable                    ^ dynamic link with libreadline.so
    #                  ^         ^ two .o files statically linked in
    

    Now you have a real executable that you can run (supercalc), that depends on the readline library.

  • Build an rpm package with all the executable and shared library (and header) in it. (The .o files, being temporary build products and not final products, aren't usually shipped.)

With this, if a bug is found in readline, you won't have to rebuild (and re-ship) your executable to get the fix - updating libreadline.so is all that is required. But if you find a bug in mathfuncs.c, you'll need to re-compile it and re-link supercalc (and ship a new version).

Related Question