Linux – a generic socket and how does it relate to a network device

ifconfiglinuxlinux-kernelsocket

I'm trying to understand how network drivers work under Linux. This Q&A showed that the network device in Linux isn't represented by a device file. It states that network drivers work with sockets.

For example, this references how to setup the network devices through ioctl calls. ioctl however needs a file descriptor, given that there are no device files for network drivers, the only file descriptor that can be passed is the one from the socket.

This brings me to the point of the question. So far it seems like the network interface, which would be a software representation of a physical network card, is actually an inferior object to a socket.

But what is a socket in this abstract sense, is it just another name for a device file that supports push notifications? I understand TCP sockets in term of connection points binded by a userspace app to a address:port pair on a network interface. I don't understand a socket as a prerequisite to set up a network interface.
Can a network interface on Linux (like eth0 listed by ifconfig) exist without a socket?
Does ifconfig or some network manager daemon keep a socket open to allow us set the network interface options?

Best Answer

Let's quickly review device files: In Linux, application programs communicate rad and write operations to the kernel through file descriptors. That works great for files, and it turned out that the same API could be used for character devices that produce and consume streams of characters, and block devices that read and write blocks of fixed size at a random access address, just by pretending that these are also files.

But a way was needed to configure those devices (set baud rates etc.), and for that, the ioctl call was invented. It just passes a data structure that's specific to the device and the kind of I/O control used to the kernel, and gets back the results in the same data structure, so it's a very generic extensible API and can be used for lots of things.

Now, how do network operations fit in? A typical network server application wants to bind to some network address, listen on a certain port (e.g. 80 for HTTP, or 22 for ssh), and if a client connects, it wants to send data to and receive data from this client. And the dual operations for the client.

It's not obvious how to fit this in with file operations (though it can be done, see Plan 9), that's why the UNIX designers invented a new API: sockets. You can find details in the section 2 man pages for socket, bind, listen, connect, send and recv. Note that while it is distinct from the file I/O API, the socket call nevertheless also returns a file descriptor. There are numerous tutorials on how to use sockets on the web, google a bit.

So far this is all pure UNIX, nobody was talking about network interfaces at the time sockets were invented. And because this API is really old, it is defined for a variety of network protocols beyond the Internet protocol (look at the AF_* constants), though only a few of those are supported in Linux.

But as computers started to get multiple network cards, some abstraction for this was needed. In Linux, that is the network interface (NI). It's not only used for a piece of hardware, but also for various tunnels, user application endpoints that server as tunnels like OpenVPN etc. As explained, the socket API isn't based on (special) files and independent of the filesystem. In the same way, network interfaces don't show up in the file system, either. However, the NIs are made available in the /proc and /sys filesystem (as well as other networking tunables).

A NI is simple a kernel abstraction of an endpoint where network packets enter and leave the kernel. Sockets, on the other hand, are used to communicate packets with applications. No socket needs to be involved with the processing of a packet. For example, when forwarding is enabled, a packet may enter on one NI and leave on another. In that sense, sockets and network interfaces are totally independent.

But there had to be a way to configure NIs, just like you needed a way to configure block and character devices. And since sockets already returned a file descriptor, it was somewhat logical to just allow an ioctl on that file descriptor. That's the netdevice interface you linked.

There are quite a few other abuses of system calls in a similar way, for example for packet filtering, packet capture etc.

All of this has grown piece after piece, and is not particularly logical in many places. If it had be designed all at once, one could probably have made a more orthogonal API.

Related Solutions

Does Linux automatically clean up abstract domain sockets

Yes, linux automatically "cleans up" abstract sockets to the extent that cleaning up even makes sense. Here's a minimal working example with which you can verify this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

int
main(int argc, char **argv)
{
  int s;
  struct sockaddr_un sun;

  if (argc != 2 || strlen(argv[1]) + 1 > sizeof(sun.sun_path)) {
    fprintf(stderr, "usage: %s abstract-path\n", argv[0]);
    exit(1);
  }

  s = socket(AF_UNIX, SOCK_STREAM, 0);
  if (s < 0) {
    perror("socket");
    exit(1);
  }
  memset(&sun, 0, sizeof(sun));
  sun.sun_family = AF_UNIX;
  strcpy(sun.sun_path + 1, argv[1]);
  if (bind(s, (struct sockaddr *) &sun, sizeof(sun))) {
    perror("bind");
    exit(1);
  }
  pause();
}

Run this program as ./a.out /test-socket &, then run ss -ax | grep test-socket, and you will see the socket in use. Then kill %./a.out, and ss -ax will show the socket is gone.

However, the reason you can't find this clean-up in any documentation is that it isn't really cleaning up in the same sense that non-abstract unix-domain sockets need cleaning up. A non-abstract socket actually allocates an inode and creates an entry in a directory, which needs to be cleaned up in the underlying file system. By contrast, think of an abstract socket more like a TCP or UDP port number. Sure, if you bind a TCP port and then exit, that TCP port will be free again. But whatever 16-bit number you used still exists abstractly and always did. The namespace of port numbers is 1-65535 and never changes or needs cleaning.

So just think of the abstract socket name like a TCP or UDP port number, just picked from a much larger set of possible port numbers that happen to look like pathnames but are not. You can't bind the same port number twice (barring SO_REUSEADDR or SO_REUSEPORT). But closing the socket (explicitly or implicitly by terminating) frees the port, with nothing left to clean up.

Bash – the difference between &6 and /dev/fd/6

It's so because reading from /dev/fd/ entries which represents sockets isn't implemented on Linux. You can find quite a good writeup on reasoning here. So you can call stat on the link, and that's why you see it with ls, but access is deliberately disallowed.

Now for the second part - why does bash -c 'ls -l /dev/fd/6; cat <&6' 6</dev/tcp/localhost/12345 work? That's because socket is read from using socket/file API, not /proc filesystem. This is what I've observed happening:

bash instance running in your terminal creates socket with fd 6.
Child bash runs and calls dup2(6, 0), in order to attach your socket as cat's stdin.
If dup2 call didn't fail, cat reads from stdin.

You can reproduce and observe it with:

netcat -lp 12345    # in another terminal session (GNU netcat)
strace -f -e trace=open,read,write,dup2 bash -c 'ls -l /dev/fd/6; cat <&6' \
 6</dev/tcp/localhost/12345

If you're wondering why does the bash child process have access to fd 6 - file descriptors survive fork, and if they aren't marked for closing on exec, they don't get closed there as well.

Best Answer

Related Solutions

Does Linux automatically clean up abstract domain sockets

Bash – the difference between &6 and /dev/fd/6

Related Question