Debian – Cannot create “Hello World” module (and NVIDIA, and VirtualBox)

debiandriverskernelnvidia

First off, the details.

BEFORE: kernel: 3.2.0-2-amd64, nvidia driver: 295.59

AFTER: kernel: 3.2.0-3-amd64, nvidia driver: 302.17-3

My Debian wheezy is kept recent at all times. Actually, doing daily apt-get upgrade -s got me in this trouble in the first place.

Evidently, after an apt-get upgrade, something "broke" on my Debian — something related to the build ecosystem and/or DKMS itself.

The NVIDIA driver cannot get build by ANY method recommended in the official Wikis. Including the NVIDIA official binary (log snippet from that at one of the updates).

Here's the output of dpkg-reconfigure nvidia-kernel-dkms:

# dpkg-reconfigure nvidia-kernel-dkms

------------------------------
Deleting module version: 302.17
completely from the DKMS tree.
------------------------------
Done.
Loading new nvidia-302.17 DKMS files...
Building only for 3.2.0-3-amd64
Building initial module for 3.2.0-3-amd64
Error!  Build of nvidia.ko failed for: 3.2.0-3-amd64 (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/nvidia/302.17/build/ for more information.

A relevant snippet from /var/lib/dkms/nvidia/302.17/build/make.log follows. The problem is not in the compilation, I can guarantee that.

  LD [M]  /var/lib/dkms/nvidia/302.17/build/nvidia.o
  Building modules, stage 2.
  MODPOST 0 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'
make: Leaving directory `/var/lib/dkms/nvidia/302.17/build'

And that's it. No explanation of any kind in any other files in the same directory (at least as far as I checked).

Before I ask my questions: I am using nouveau driver now (it's not like I got any choice anyway), but it doesn't work too well for me. I got 3 desktops, constantly playing movies on 1 of them, and being a very busy developer on the other 2. The nouveau driver fails a little bit there (the movies on the second screen get horizontal stripes all the time, the XFCE consoles lag a bit on the scrolling, etc.)

Questions:

  • Should I change my kernel version? Tried 3.2.0-2-amd64 and 3.2.0-3-amd64, to no avail. Trying 3.2.0-3-rt-amd64 makes my machine freeze after few minutes of operation, thus I don't dare to install it again.
  • Should I change a version of something in my build environment? (As pointed in the updates, it's not just NVIDIA problem, as it turns out).
  • Should I assume that my linker is at fault (I am not using gold, I am using ld from the binutils package) and if so, what could I do do make the DKMS method finally work? Since the problem does seem to manifest itself on the linkage phase (and MODPOST shows 0 modules).

On a personal note, this disturbs me on a lot deeper level I care to usually admit. I had a big respect to Debian, which at the moment is shattered. C'mon, a simple apt-get upgrade breaks all open-source kernel drivers compilations / linkages?

Extremely disappointing.

UPDATE #1:

I did in fact try to install the official 304.22 NVIDIA drivers, here's the log file. Looks like the linking does indeed fail, does it?

Also, if I try to also enable DKMS integration, I get a message of the sorts that the script cannot determine the current kernel version (text in the 3rd update).

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Sat Jul 21 22:59:30 2012
installer version: 304.22

PATH: /usr/local/rvm/gems/ruby-1.9.3-p194/bin:/usr/local/rvm/gems/ruby-1.9.3-p194@global/bin:/usr/local/rvm/rubies/ruby-1.9.3-p194/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer

Using: nvidia-installer ncurses user interface
-> License accepted.
-> Installing NVIDIA driver version 304.22.
-> There appears to already be a driver installed on your system (version: 304.22).  As part of installing this driver (version: 304.22), the existing driver will be uninstalled.  Are you sure you want to continue? ('no' will abort installation) (Answer: Yes)
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: No)
-> Performing CC sanity check with CC="gcc-4.6".
-> Performing CC version check with CC="gcc-4.6".
-> Kernel source path: '/lib/modules/3.2.0-3-amd64/source'
-> Kernel output path: '/lib/modules/3.2.0-3-amd64/build'
-> Performing rivafb check.
-> Performing nvidiafb check.
-> Performing Xen check.
-> Cleaning kernel module build directory.
   executing: 'cd ./kernel; make clean'...
-> Building kernel module:
   executing: 'cd ./kernel; make module SYSSRC=/lib/modules/3.2.0-3-amd64/source SYSOUT=/lib/modules/3.2.0-3-amd64/build'...
   NVIDIA: calling KBUILD...
   make -C /lib/modules/3.2.0-3-amd64/build \
    KBUILD_SRC=/usr/src/linux-headers-3.2.0-3-common \
    KBUILD_EXTMOD="/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel" -f /usr/src/linux-headers-3.2.0-3-common/Makefile \
    modules
   test -e include/generated/autoconf.h -a -e include/config/auto.conf || (     \
    echo;                               \
    echo "  ERROR: Kernel configuration is invalid.";       \
    echo "         include/generated/autoconf.h or include/config/auto.conf are missing.";\
    echo "         Run 'make oldconfig && make prepare' on kernel src to fix it.";  \
    echo;                               \
    /bin/false)
   mkdir -p /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_versions ; rm -f /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_versions/*
   make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.build obj=/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel
     gcc-4.6 -Wp,-MD,/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.nv.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -I/usr/src/linux-headers-3.2.0-3-common/arch/x86/include -Iarch/x86/include/generated -Iinclude  -I/usr/src/linux-headers-3.2.0-3-common/include -include /usr/src/linux-headers-3.2.0-3-common/include/linux/kconfig.h   -I/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO   -I/tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel -Wall -MD -Wsign-compare -Wno-cast-
   qual -Wno-error -D__KERNEL__ -DMODULE -DNVRM -DNV_VERSION_STRING=\"304.22\" -Wno-unused-function -Wuninitialized -mno-red-zone -mcmodel=kernel -UDEBUG -U_DEBUG -DNDEBUG  -DMODULE  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(nv)"  -D"KBUILD_MODNAME=KBUILD_STR(nvidia)" -c -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/.tmp_nv.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c
   In file included from /usr/src/linux-headers-3.2.0-3-common/include/linux/kernel.h:17:0,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/sched.h:55,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/utsname.h:35,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-linux.h:38,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c:13:
   /usr/src/linux-headers-3.2.0-3-common/include/linux/bitops.h: In function ‘hweight_long’:
   /usr/src/linux-headers-3.2.0-3-common/include/linux/bitops.h:49:41: warning: signed and unsigned type in conditional expression [-Wsign-compare]
   In file included from /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess.h:575:0,
                    from /usr/src/linux-headers-3.2.0-3-common/include/linux/poll.h:14,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-linux.h:97,
                    from /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.c:13:
   /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess_64.h: In function ‘copy_from_user’:
   /usr/src/linux-headers-3.2.0-3-common/arch/x86/include/asm/uaccess_64.h:53:6: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

...snipped lots of compile output with the same warning...

     ld -m elf_x86_64   -r -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nvidia.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-kernel.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-acpi.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-chrdev.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-cray.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-gvi.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-i2c.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-mempool.o /tmp/selfgz10141/NVI
   DIA-Linux-x86_64-304.22/kernel/nv-mlock.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-mmap.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-p2p.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-pat.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-procfs.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-usermap.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-vm.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nv-vtophys.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-agp.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-interface.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-mtrr.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-registry.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-smp.o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/os-usermap.o 
   (cat /dev/null;   echo kernel//tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/nvidia.ko;) > /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/modules.order
   make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.modpost
     scripts/mod/modpost -m  -i /usr/src/linux-headers-3.2.0-3-amd64/Module.symvers -I /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/Module.symvers  -o /tmp/selfgz10141/NVIDIA-Linux-x86_64-304.22/kernel/Module.symvers -S -w  -s
   NVIDIA: left KBUILD.
   nvidia.ko failed to build!
   make[1]: *** [module] Error 1
   make: *** [module] Error 2
-> Error.
ERROR: Unable to build the NVIDIA kernel module.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

UPDATE #2:

As per the suggestion of StarNamer, I did reinstall linux-headers-3.2.0-3-amd64. After that was done, DKMS kicked in and tried again to compile the NVIDIA driver. Here's the contents of the file /var/lib/dkms/nvidia/304.22/build/make.log:

DKMS make.log for nvidia-304.22 for kernel 3.2.0-3-amd64 (x86_64)
Sun Jul 22 14:50:58 EEST 2012
If you are using a Linux 2.4 kernel, please make sure
you either have configured kernel sources matching your
kernel or the correct set of kernel headers installed
on your system.

If you are using a Linux 2.6 kernel, please make sure
you have configured kernel sources matching your kernel
installed on your system. If you specified a separate
output directory using either the "KBUILD_OUTPUT" or
the "O" KBUILD parameter, make sure to specify this
directory with the SYSOUT environment variable or with
the equivalent nvidia-installer command line option.

Depending on where and how the kernel sources (or the
kernel headers) were installed, you may need to specify
their location with the SYSSRC environment variable or
the equivalent nvidia-installer command line option.

*** Unable to determine the target kernel version. ***

make: *** [select_makefile] Error 1

UPDATE #3:

After days and days of googling, I started to wonder if that's NVIDIA's fault at all. Turns out, it's not. I tried to install Virtual Box 4.1 (from the testing repo), and I stumbled upon this again:

# cat /var/lib/dkms/virtualbox/4.1.18/build/make.log 
DKMS make.log for virtualbox-4.1.18 for kernel 3.2.0-3-amd64 (x86_64)
Tue Jul 24 17:58:57 EEST 2012
make: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
  LD      /var/lib/dkms/virtualbox/4.1.18/build/built-in.o
  LD      /var/lib/dkms/virtualbox/4.1.18/build/vboxdrv/built-in.o
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxdrv/linux/SUPDrv-linux.o
... snipped ...
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/SUPR0IdcClientComponent.o
  CC [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/linux/SUPR0IdcClient-linux.o
  LD [M]  /var/lib/dkms/virtualbox/4.1.18/build/vboxpci/vboxpci.o
  Building modules, stage 2.
  MODPOST 0 modules
make: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'

And of course, no more details (as already have been said, it does seem like a linker problem, but I cannot be sure yet). So this must be more of a Debian / DKMS problem or misconfiguration of some kind. However, I swear I didn't touch anything. I was simply doing daily apt-get upgrade-s. Then something went not so well, obviously.

UPDATE #4:

I did try create a small module as described here: https://stackoverflow.com/questions/4715259/linux-modpost-does-not-build-anything. Indeed I am still seeing MODPOST 0 modules. Here's the output when I put V=1 in the Makefile:

# make
make -C /lib/modules/3.2.0-3-amd64/build M=/home/dimi/code/hello V=1 modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
make -C /usr/src/linux-headers-3.2.0-3-amd64 \
    KBUILD_SRC=/usr/src/linux-headers-3.2.0-3-common \
    KBUILD_EXTMOD="/home/dimi/code/hello" -f /usr/src/linux-headers-3.2.0-3-common/Makefile \
    modules
test -e include/generated/autoconf.h -a -e include/config/auto.conf || (        \
    echo;                               \
    echo "  ERROR: Kernel configuration is invalid.";       \
    echo "         include/generated/autoconf.h or include/config/auto.conf are missing.";\
    echo "         Run 'make oldconfig && make prepare' on kernel src to fix it.";  \
    echo;                               \
    /bin/false)
mkdir -p /home/dimi/code/hello/.tmp_versions ; rm -f /home/dimi/code/hello/.tmp_versions/*
make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.build obj=/home/dimi/code/hello
   gcc-4.6 -Wp,-MD,/home/dimi/code/hello/.hello.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/4.6/include -I/usr/src/linux-headers-3.2.0-3-common/arch/x86/include -Iarch/x86/include/generated -Iinclude  -I/usr/src/linux-headers-3.2.0-3-common/include -include /usr/src/linux-headers-3.2.0-3-common/include/linux/kconfig.h   -I/home/dimi/code/hello -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -Os -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wframe-larger-than=2048 -Wno-unused-but-set-variable -fomit-frame-pointer -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO  -DMODULE  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hello)"  -D"KBUILD_MODNAME=KBUILD_STR(hello)" -c -o /home/dimi/code/hello/.tmp_hello.o /home/dimi/code/hello/hello.c
(cat /dev/null;   echo kernel//home/dimi/code/hello/hello.ko;) > /home/dimi/code/hello/modules.order
make -f /usr/src/linux-headers-3.2.0-3-common/scripts/Makefile.modpost
  scripts/mod/modpost -m  -i /usr/src/linux-headers-3.2.0-3-amd64/Module.symvers -I /home/dimi/code/hello/Module.symvers  -o /home/dimi/code/hello/Module.symvers -S -w -c -s
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'

And here is what I see when I remove V=1:

# make
make -C /lib/modules/3.2.0-3-amd64/build M=/home/dimi/code/hello modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-3-amd64'
  CC [M]  /home/dimi/code/hello/hello.o
  Building modules, stage 2.
  MODPOST 0 modules
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-3-amd64'

Best Answer

SOLVED!

Simple as that: /root/.bashrc had this inside:

 export GREP_OPTIONS='--color=always'

Changed it to:

 export GREP_OPTIONS='--color=never'

...and restarted the root shell (of course; do not omit this step). Everything started working again. Both NVIDIA and VirtualBox kernel modules built from the first try. I am so happy! :-)

Then again though, I am slighly disappointed by the kernel build tools. They should know better and pass --color=never everywhere they use grep; or rather, store the old value of GREP_OPTIONS, override it for the lifetime of the building process, then restore it.

I am hopeful that my epic one-week battle with this problem will prove valuable both to the community and the kernel build tools developers.

A very warm thanks to the people who were with me and tried to help.

(All credits go here: http://forums.gentoo.org/viewtopic-p-4156366.html#4156366)

Related Question