Unfortunately the find
command's -name
predicate only accepts a single pattern. If you want to search for multiple files by name you'd need to chain them with the -o
(logical OR
) operator - something like:
find Documents/ \( -name "file2.txt" -o -name "file5.txt" -o -name "file9.txt" \) -print
This makes it tricky to construct the search programmatically from a list; the closest I can come to your attempted command is:
read the list into a shell array
mapfile -t files < list.txt
use the bash shell's printf
to construct the predicate list
printf -- '-name "%s" -o ' "${files[@]}"
use eval
to evaluate the resulting command string
There's a wrinkle here inasmuch as if we use printf
's format re-use feature to construct the list in this way, we're left with a 'dangling' -o
; we can work around this by terminating the list with a -false
test (since -o -false
is a Boolean no-op) so that our final predicate string becomes
"\( $(printf -- '-name "%s" -o ' "${files[@]}") -false \)"
Putting it all together - given
$ tree dir
dir
├── Folder_1
│ ├── file1.txt
│ ├── file2.txt
│ ├── file3.txt
│ └── file4.txt
├── Folder_2
│ ├── file5.txt
│ ├── file6.txt
│ └── file7.txt
└── Folder_3
├── file8.txt
└── file9.txt
3 directories, 9 files
and
$ cat list.txt
file2.txt
file5.txt
file9.txt
then
$ mapfile -t files < list.txt
$ eval find dir/ "\( $(printf -- '-name "%s" -o ' "${files[@]}") -false \)" -print
dir/Folder_1/file2.txt
dir/Folder_2/file5.txt
dir/Folder_3/file9.txt
To copy files instead of just listing them, you could then do
$ mkdir newdir
$ eval find dir/ "\( $(printf -- '-name "%s" -o ' "${files[@]}") -false \)" -exec cp -t newdir/ -- {} +
resulting in
$ tree newdir
newdir
├── file2.txt
├── file5.txt
└── file9.txt
0 directories, 3 files
Note: the eval
command is powerful and potentially open to abuse: use with care.
In practice, given that you appear to want to find only a small number of files, the KISS approach would be to accept the performance hit of multiple find
calls and just use a loop:
while read -r f; do
find dir/ -name "$f" -exec cp -v -- {} newdir/ \;
done < list.txt
or even using xargs
xargs -a list.txt -n1 -I@ find dir/ -name @ -exec cp -v -- {} newdir/ \;
Some commands accept -
in place of a filename, either:
- To write to standard output instead of to a named file. This is what the
-
argument passed to wget
after -O
is doing.
- To read from standard input instead of from a named file. This is what the
-
argument passed to tar
after xzf
.
The command you showed downloads an archive file with wget
and unpacks it with tar
. To achieve this, the output of wget
is piped (|
) to the input of tar
. This is why wget
writes to standard output instead of a file and tar
reads from standard input instead of a file.
Best Answer
tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the
-t
option it does so only in memory.If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.
Perhaps you also want to take a look at the HDF5 format for more complex scenarios.
Measurements
I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both,
tar czf files#.tgz files#
andzip -r files#.zip files#
.For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.
Test 1
Directory
files1
containing 100,000 empty files.zip is slower here.
Test 2
Directory
files2
containing 5,000 files with 512 bytes of random data each.Still not convincing, but zip is faster this time.
Test 3
Directory
files3
containing 5,000 files with 5kB of random data each.In this test it can be seen that the larger the files get, the harder it is for tar to list them.
Conclusion
To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.