Yes, linux automatically "cleans up" abstract sockets to the extent that cleaning up even makes sense. Here's a minimal working example with which you can verify this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
int
main(int argc, char **argv)
{
int s;
struct sockaddr_un sun;
if (argc != 2 || strlen(argv[1]) + 1 > sizeof(sun.sun_path)) {
fprintf(stderr, "usage: %s abstract-path\n", argv[0]);
exit(1);
}
s = socket(AF_UNIX, SOCK_STREAM, 0);
if (s < 0) {
perror("socket");
exit(1);
}
memset(&sun, 0, sizeof(sun));
sun.sun_family = AF_UNIX;
strcpy(sun.sun_path + 1, argv[1]);
if (bind(s, (struct sockaddr *) &sun, sizeof(sun))) {
perror("bind");
exit(1);
}
pause();
}
Run this program as ./a.out /test-socket &
, then run ss -ax | grep test-socket
, and you will see the socket in use. Then kill %./a.out
, and ss -ax
will show the socket is gone.
However, the reason you can't find this clean-up in any documentation is that it isn't really cleaning up in the same sense that non-abstract unix-domain sockets need cleaning up. A non-abstract socket actually allocates an inode and creates an entry in a directory, which needs to be cleaned up in the underlying file system. By contrast, think of an abstract socket more like a TCP or UDP port number. Sure, if you bind a TCP port and then exit, that TCP port will be free again. But whatever 16-bit number you used still exists abstractly and always did. The namespace of port numbers is 1-65535 and never changes or needs cleaning.
So just think of the abstract socket name like a TCP or UDP port number, just picked from a much larger set of possible port numbers that happen to look like pathnames but are not. You can't bind the same port number twice (barring SO_REUSEADDR
or SO_REUSEPORT
). But closing the socket (explicitly or implicitly by terminating) frees the port, with nothing left to clean up.
It's so because reading from /dev/fd/
entries which represents sockets isn't implemented on Linux. You can find quite a good writeup on reasoning here. So you can call stat
on the link, and that's why you see it with ls
, but access is deliberately disallowed.
Now for the second part - why does bash -c 'ls -l /dev/fd/6; cat <&6' 6</dev/tcp/localhost/12345
work? That's because socket is read from using socket/file API, not /proc
filesystem. This is what I've observed happening:
bash
instance running in your terminal creates socket with fd 6.
- Child
bash
runs and calls dup2(6, 0)
, in order to attach your socket as cat
's stdin
.
- If
dup2
call didn't fail, cat reads from stdin
.
You can reproduce and observe it with:
netcat -lp 12345 # in another terminal session (GNU netcat)
strace -f -e trace=open,read,write,dup2 bash -c 'ls -l /dev/fd/6; cat <&6' \
6</dev/tcp/localhost/12345
If you're wondering why does the bash
child process have access to fd 6 - file descriptors survive fork
, and if they aren't marked for closing on exec
, they don't get closed there as well.
Best Answer
Let's quickly review device files: In Linux, application programs communicate rad and write operations to the kernel through file descriptors. That works great for files, and it turned out that the same API could be used for character devices that produce and consume streams of characters, and block devices that read and write blocks of fixed size at a random access address, just by pretending that these are also files.
But a way was needed to configure those devices (set baud rates etc.), and for that, the ioctl call was invented. It just passes a data structure that's specific to the device and the kind of I/O control used to the kernel, and gets back the results in the same data structure, so it's a very generic extensible API and can be used for lots of things.
Now, how do network operations fit in? A typical network server application wants to bind to some network address, listen on a certain port (e.g. 80 for HTTP, or 22 for ssh), and if a client connects, it wants to send data to and receive data from this client. And the dual operations for the client.
It's not obvious how to fit this in with file operations (though it can be done, see Plan 9), that's why the UNIX designers invented a new API: sockets. You can find details in the section 2 man pages for
socket
,bind
,listen
,connect
,send
andrecv
. Note that while it is distinct from the file I/O API, thesocket
call nevertheless also returns a file descriptor. There are numerous tutorials on how to use sockets on the web, google a bit.So far this is all pure UNIX, nobody was talking about network interfaces at the time sockets were invented. And because this API is really old, it is defined for a variety of network protocols beyond the Internet protocol (look at the
AF_*
constants), though only a few of those are supported in Linux.But as computers started to get multiple network cards, some abstraction for this was needed. In Linux, that is the network interface (NI). It's not only used for a piece of hardware, but also for various tunnels, user application endpoints that server as tunnels like OpenVPN etc. As explained, the socket API isn't based on (special) files and independent of the filesystem. In the same way, network interfaces don't show up in the file system, either. However, the NIs are made available in the
/proc
and/sys
filesystem (as well as other networking tunables).A NI is simple a kernel abstraction of an endpoint where network packets enter and leave the kernel. Sockets, on the other hand, are used to communicate packets with applications. No socket needs to be involved with the processing of a packet. For example, when forwarding is enabled, a packet may enter on one NI and leave on another. In that sense, sockets and network interfaces are totally independent.
But there had to be a way to configure NIs, just like you needed a way to configure block and character devices. And since sockets already returned a file descriptor, it was somewhat logical to just allow an
ioctl
on that file descriptor. That's the netdevice interface you linked.There are quite a few other abuses of system calls in a similar way, for example for packet filtering, packet capture etc.
All of this has grown piece after piece, and is not particularly logical in many places. If it had be designed all at once, one could probably have made a more orthogonal API.