I am not at all convinced of this, but let's suppose for the sake of argument that you could, if you're prepared to put in enough effort, parse the output of ls
reliably, even in the face of an "adversary" — someone who knows the code you wrote and is deliberately choosing filenames designed to break it.
Even if you could do that, it would still be a bad idea.
Bourne shell is not a good language. It should not be used for anything complicated, unless extreme portability is more important than any other factor (e.g. autoconf
).
I claim that if you're faced with a problem where parsing the output of ls
seems like the path of least resistance for a shell script, that's a strong indication that whatever you are doing is too complicated for shell and you should rewrite the entire thing in Perl or Python. Here's your last program in Python:
import os, sys
for subdir, dirs, files in os.walk("."):
for f in dirs + files:
ino = os.lstat(os.path.join(subdir, f)).st_ino
sys.stdout.write("%d %s %s\n" % (ino, subdir, f))
This has no issues whatsoever with unusual characters in filenames -- the output is ambiguous in the same way the output of ls
is ambiguous, but that wouldn't matter in a "real" program (as opposed to a demo like this), which would use the result of os.path.join(subdir, f)
directly.
Equally important, and in stark contrast to the thing you wrote, it will still make sense six months from now, and it will be easy to modify when you need it to do something slightly different. By way of illustration, suppose you discover a need to exclude dotfiles and editor backups, and to process everything in alphabetical order by basename:
import os, sys
filelist = []
for subdir, dirs, files in os.walk("."):
for f in dirs + files:
if f[0] == '.' or f[-1] == '~': continue
lstat = os.lstat(os.path.join(subdir, f))
filelist.append((f, subdir, lstat.st_ino))
filelist.sort(key = lambda x: x[0])
for f, subdir, ino in filelist:
sys.stdout.write("%d %s %s\n" % (ino, subdir, f))
ls -F
will:
Write a ( '/' ) immediately after each pathname that is a directory, an ( '*' ) after each that is executable, a ( '|' ) after each that is a FIFO, and an at-sign ( '@' ) after each that is a symbolic link.
GNU ls includes additional signals:
... ‘=’ for sockets, ‘>’ for doors
=
is also present in the major BSDs (FreeBSD, OpenBSD, NetBSD, OS X). All of those but OpenBSD also include %
for whiteouts. Most of the commercial Unices include =
, but it is non-standard.
A *
will appear after a file if it is marked as executable — that is, if the executable bit is set. It doesn't necessarily mean you could actually run the file. You can unset the executable bit with chmod -x
; generally you won't want text files and PDFs to be executable, so you could do that. Executable files will also have the x
in ls -l
output.
For the others:
/
indicates a directory, which is pretty straightforward.
|
indicates a FIFO, which is a named pipe made with mkfifo
(data can be written into it, and read back exactly once).
@
indicates a symbolic link made with ln -s
, which is an alias for another path.
=
indicates a socket, a special file for communicating with other processes.
>
for doors is another inter-process communication feature from some systems.
%
for whiteouts indicates a special file used to mark deletions made in upper layers of a union filesystem stack.
A "regular file" is what you'd conventionally think of as a file, one that you can write data into and read it back later. Alternatively, you can think of it as anything that isn't in one of the above categories.
Hard links are not distinguished from other files at all, either in ls -F
output or otherwise. In fact, you can think of every file as a hard link to itself. You can look at the number of links to a given file in the ls -l
output. The second field is the number of links:
-rw-r--r-- 3 root root 92766 Feb 20 11:42 test.txt
This file has three links. None of them is the "main" link, and you can't tell which one is the original in any way. If you delete one, the count will go down, but the others will still refer to the same file.
All of the -F
indicators other than *
do map onto one of the values of the first field of the mode output in ls -l
, but there are additional values that can appear there as well, notably b
for block devices, c
for character devices, and other system-specific indicators.
Best Answer
?
means that no SELinux context was found:SELinux isn’t installed or enabled by default everywhere; for example, Fedora and RHEL install and enable it by default, but Debian and Ubuntu don’t.