Man section or other doc repository for data structure definitions

documentationkernelman

For the millionth time or so I had to look in /usr/include/foo.h to find the members of a struct foo or foo_t or whatever. There's a man section for library and kernel calls, some of which include descriptions of data structures and others not. Is there a single place to look up the definitions of kernel and library data structures?

Best Answer

The GNU C library has a reference manual that includes documentation for all or most of the data structures in the standard library and extensions. This has a type index. Beware there's also a "GNU C Reference Manual", but it and the "GNU C Library Reference Manual" are two different things.

You can also autogenerate documentation sufficient for browsing data structures with doxygen (note that it works much better with stuff that's actually annotated for it, but it can be crudely used this way). I tried this here on /usr/include and it took < 2 minutes (producing, n.b., ~800 MB of html). The steps were:

Create a basic config file somewhere (anywhere), doxygen -g doxygen.conf.

Edit the file and change the following settings:

OUTPUT_DIRECTORY       = /home/foo/whatever # documentation goes here
OPTIMIZE_OUTPUT_FOR_C  = YES
EXTRACT_ALL            = YES
INPUT                  = /usr/include
FILE_PATTERNS          = *.h
RECURSIVE              = YES
GENERATE_LATEX         = NO

Note that all those already exist in the config file, you need to search through and change/set the values as shown.

Generate: doxygen doxygen.conf.

Now open /home/foo/whatever/html/files.html. There is an index.html, but it is probably WTF'd up (again, doxygen is primarily intended for stuff that's purposefully annotated for it), so the file list is of the most predicatable entry point. There's also a copious "Data Structure Index", but for whatever reason, not everything you would think is indexed in it. E.g., there's a structstat.html you can reach by following the file list, asm-generic -> stat.h, but "struct stat" is not mentioned in the "Data Structures Index". Many standard C lib things follow this pattern: there's a macro/define/typedef in the predicatable header (sys/stat.h) that pulls in something extern that ends up being in a platform/system specific header in, e.g. asm-generic.h. I'm sure you've noticed this before. The stat example is not so bad in so far as at least the final definition is still called struct stat and not struct _fooX_stat.

So this takes some getting used to and is, in the end, not much better than tooling around with grep. It also has the dis(?)advantage that non-user fields are included (e.g., compare the struct stat as documented above to its description in man 2 stat). For the standard library (and GNU extensions) the reference manual is much better. However, WRT stuff that's not in that manual, it is slightly better than nothing. I'd recommend that if you do want to use it that way, it would be better to do individual directories independently rather than the whole shebang (clue: you can set RECURSION = NO). 800 MB of html is pretty unwieldy.

The OS implementation view

Consider what happens if a system call is interrupted by a signal. The signal handler will execute user-mode code. But the syscall handler is kernel code and does not trust any user-mode code. So let's explore the choices for the syscall handler:

Terminate the system call; report how much was done to the user code. It's up to the application code to restart the system call in some way, if desired. That's how unix works.
Save the state of the system call, and allow the user code to resume the call. This is problematic for several reasons:
- While the user code is running, something could happen to invalidate the saved state. For example, if reading from a file, the file might be truncated. So the kernel code would need a lot of logic to handle these cases.
- The saved state can't be allowed to keep any lock, because there's no guarantee that the user code will ever resume the syscall, and then the lock would be held forever.
- The kernel must expose new interfaces to resume or cancel ongoing syscalls, in addition to the normal interface to start a syscall. This is a lot of complication for a rare case.
- The saved state would need to use resources (memory, at least); those resources would need to be allocated and held by the kernel but be counted against the process's allotment. This isn't insurmountable, but it is a complication.
  - Note that the signal handler might make system calls that themselves get interrupted; so you can't just have a static resource allotment that covers all possible syscalls.
  - And what if the resources cannot be allocated? Then the syscall would have to fail anyway. Which means the application would need to have code to handle this case, so this design would not simplify the application code.
Remain in progress (but suspended), create a new thread for the signal handler. This, again, is problematic:
- Early unix implementations had a single thread per process.
- The signal handler would risk overstepping on the syscall's shoes. This is an issue anyway, but in the current unix design, it's contained.
- Resources would need to be allocated for the new thread; see above.

The main difference with an interrupt is that the interrupt code is trusted, and highly constrained. It's usually not allowed to allocate resources, or run forever, or take locks and not release them, or do any other kind of nasty things; since the interrupt handler is written by the OS implementer himself, he knows that it won't do anything bad. On the other hand, application code can do anything.

The application design view

When an application is interrupted in the middle of a system call, should the syscall continue to completion? Not always. For example, consider a program like a shell that's reading a line from the terminal, and the user presses Ctrl+C, triggering SIGINT. The read must not complete, that's what the signal is all about. Note that this example shows that the read syscall must be interruptible even if no byte has been read yet.

So there must be a way for the application to tell the kernel to cancel the system call. Under the unix design, that happens automatically: the signal makes the syscall return. Other designs would require a way for the application to resume or cancel the syscall at its leasure.

The read system call is the way it is because it's the primitive that makes sense, given the general design of the operating system. What it means is, roughly, “read as much as you can, up to a limit (the buffer size), but stop if something else happens”. To actually read a full buffer involves running read in a loop until as many bytes as possible have been read; this is a higher-level function, fread(3). Unlike read(2) which is a system call, fread is a library function, implemented in user space on top of read. It's suitable for an application that reads for a file or dies trying; it's not suitable for a command line interpreter or for a networked program that must throttle connections cleanly, nor for a networked program that has concurrent connections and doesn't use threads.

The example of read in a loop is provided in Robert Love's Linux System Programming:

ssize_t ret;
while (len != 0 && (ret = read (fd, buf, len)) != 0) {
  if (ret == -1) {
    if (errno == EINTR)
      continue;
    perror ("read");
    break;
  }
  len -= ret;
  buf += ret;
}

It takes care of case i and case ii and few more.

Debian Kernel Module Error – Cannot Create ‘Hello World’ Module with NVIDIA and VirtualBox

SOLVED!

Simple as that: /root/.bashrc had this inside:

 export GREP_OPTIONS='--color=always'

Changed it to:

 export GREP_OPTIONS='--color=never'

...and restarted the root shell (of course; do not omit this step). Everything started working again. Both NVIDIA and VirtualBox kernel modules built from the first try. I am so happy! :-)

Then again though, I am slighly disappointed by the kernel build tools. They should know better and pass --color=never everywhere they use grep; or rather, store the old value of GREP_OPTIONS, override it for the lifetime of the building process, then restore it.

I am hopeful that my epic one-week battle with this problem will prove valuable both to the community and the kernel build tools developers.

A very warm thanks to the people who were with me and tried to help.

(All credits go here: http://forums.gentoo.org/viewtopic-p-4156366.html#4156366)

Best Answer

Related Solutions

Interruption of system calls when a signal is caught

The OS implementation view

The application design view

Debian Kernel Module Error – Cannot Create ‘Hello World’ Module with NVIDIA and VirtualBox

Related Question