How to use wildcards with ls to find files that are missing in a numeric sequence

lswildcards

I'm trying to list missing files in a sequence in the terminal. While many answers existed here, they are not generic enough that I managed to adapt it to my situation. So if you can make a generic enough answer that will work for more people, please do it.

I'm doing ls {369..422}.avi >/dev/null to list missing files but I can't use * to match anything that ends with .avi. How can I do this?

The numbers are not in the end of the file but in the middle. So I should need something like *numbers*.avi

Best Answer

ls *{369..422}*.avi >/dev/null

This will first generate patterns like

*369*.avi
*370*.avi
*371*.avi
*372*.avi
*373*.avi
*374*.avi

through the brace expansion, and then ls will be executed with these patterns, which will give you an error message for each pattern that can't be expanded to a name in the current directory.

Alternatively, if you have no files that contain * in their name:

for name in *{369..422}*.avi; do
    case "$name" in 
        '*'*) printf '"%s" not matched\n' "$name" ;;
    esac
done

This relies on the fact that the pattern remains unexpanded if it did not match a name in the current directory. This gives you a way of possibly doing something useful for the missing files, without resorting to parsing the error messages of ls.

UPDATE (2014-02-02)

Thanks to our very own @Anthon's determination in following the lack of this feature up, we have a slightly more formal reason as to why this feature is lacking, which reiterates what I explained earlier:

Re: [PATCH] ls: adding --zero/-z option, including tests

From:      Pádraig Brady
Subject:   Re: [PATCH] ls: adding --zero/-z option, including tests
Date:      Mon, 03 Feb 2014 15:27:31 +0000
Thanks a lot for the patch. If we were to do this then this is the interface we would use. However ls is really a tool for direct consumption by a human, and in that case further processing is less useful. For futher processing, find(1) is more suited. That is well described in the first answer at the link above.

So I'd be 70:30 against adding this.

My original answer

This is a bit of my personal opinion but I believe it to be a design decision in leaving that switch out of ls. If you notice the find command does have this switch:

-print0
      True; print the full file name on the standard output, followed by a 
      null character (instead of the newline character that -print uses).  
      This allows file  names  that  contain  newlines or other types of white 
      space to be correctly interpreted by programs that process the find 
      output.  This option corresponds to the -0 option of xargs.

By leaving that switch out, the designers were implying that you should not be using ls output for anything other than human consumption. For downstream processing by other tools, you should be using find instead.

Ways to use find

If you're just looking for the alternative methods you can find them here, titled: Doing it correctly: A quick summary. From that link these are likely the 3 more common patterns:

Simple find -exec; unwieldy if COMMAND is large, and creates 1 process/file:
```
find . -exec COMMAND... {} \;
```
Simple find -exec with +, faster if multiple files are okay for COMMAND:
```
find . -exec COMMAND... {} \+
```
Use find and xargs with \0 separators
(nonstandard common extensions -print0 and -0. Works on GNU, *BSDs, busybox)
```
find . -print0 | xargs -0 COMMAND
```

Further evidence?

I found this blog post from Joey Hess' blog titled: "ls: the missing options". One of the interesting comments in this post:

The only obvious lack now is a -z option, which should make output filenames be NULL terminated for consuption by other programs. I think this would be easy to write, but I've been extermely busy IRL (moving lots of furniture) and didn't get to it. Any takers to write it?

Further searching I found this in the commit logs from one of the additional switches that Joey's blog post mentions, "new output format -j", so it would seem that the blog post was poking fun at the notion of ever adding a -z switch to ls.

As to the other options, multiple people agree that -e is nearly almost useful, although none of us can quite find a reason to use it. My bug report neglected to mention that ls -eR is very buggy. -j is clearly a joke.

References

Why Not to Parse ls Command and What to Use Instead

I am not at all convinced of this, but let's suppose for the sake of argument that you could, if you're prepared to put in enough effort, parse the output of ls reliably, even in the face of an "adversary" — someone who knows the code you wrote and is deliberately choosing filenames designed to break it.

Even if you could do that, it would still be a bad idea.

Bourne shell is not a good language. It should not be used for anything complicated, unless extreme portability is more important than any other factor (e.g. autoconf).

I claim that if you're faced with a problem where parsing the output of ls seems like the path of least resistance for a shell script, that's a strong indication that whatever you are doing is too complicated for shell and you should rewrite the entire thing in Perl or Python. Here's your last program in Python:

import os, sys
for subdir, dirs, files in os.walk("."):
    for f in dirs + files:
      ino = os.lstat(os.path.join(subdir, f)).st_ino
      sys.stdout.write("%d %s %s\n" % (ino, subdir, f))

This has no issues whatsoever with unusual characters in filenames -- the output is ambiguous in the same way the output of ls is ambiguous, but that wouldn't matter in a "real" program (as opposed to a demo like this), which would use the result of os.path.join(subdir, f) directly.

Equally important, and in stark contrast to the thing you wrote, it will still make sense six months from now, and it will be easy to modify when you need it to do something slightly different. By way of illustration, suppose you discover a need to exclude dotfiles and editor backups, and to process everything in alphabetical order by basename:

import os, sys
filelist = []
for subdir, dirs, files in os.walk("."):
    for f in dirs + files:
        if f[0] == '.' or f[-1] == '~': continue
        lstat = os.lstat(os.path.join(subdir, f))
        filelist.append((f, subdir, lstat.st_ino))

filelist.sort(key = lambda x: x[0])
for f, subdir, ino in filelist: 
   sys.stdout.write("%d %s %s\n" % (ino, subdir, f))