Xargs to extract filename

awkechofindxargs

I would like to find all the .html files in a folder and append [file](./file.html) to another file called index.md. I tried the following command:

ls | awk "/\.html$/" | xargs -0 -I @@ -L 1 sh -c 'echo "[${@@%.*}](./@@)" >> index.md'

But it can't substitute @@ inside the command? What am I doing wrong?

Note: Filename can contain valid characters like space

Clarification:

index.md would have each line with [file](./file.html) where file is the actual file name in the folder

Best Answer

Just do:

for f in *.html; do printf '%s\n' "[${f%.*}](./$f)"; done > index.md

Use set -o nullglob (zsh, yash) or shopt -s nullglob (bash) for *.html to expand to nothing instead of *.html (or report an error in zsh) when there's no html file. With zsh, you can also use *.html(N) or in ksh93 ~(N)*.html.

Or with one printf call with zsh:

files=(*.html)
rootnames=(${files:r})
printf '[%s](./%s)\n' ${basenames:^files} > index.md

Note that, depending on which markdown syntax you're using, you may have to HTML-encode the title part and URI-encode the URI part if the file names contain some problematic characters. Not doing so could even end up introducing a form of XSS vulnerability depending on context. With ksh93, you can do it with:

for f in *.html; do
  title=${ printf %H "${file%.*}"; }
  title=${title//$'\n'/"<br/>"}
  uri=${ printf '%#H' "$file"; }
  uri=${uri//$'\n'/%0A}      
  printf '%s\n' "[$title]($uri)"
done > index.md

Where %H¹ does the HTML encoding and %#H the URI encoding, but we still need to address newline characters separately.

Or with perl:

perl -MURI::Encode=uri_encode -MHTML::Entities -CLSA -le '
  for (<*.html>) {
     $uri = uri_encode("./$_");
     s/\.html\z//;
     $_ = encode_entities $_;
     s:\n:<br/>:g;
     print "[$_]($uri)"
  }'

Using <br/> for newline characters. You may want to use ␤ instead or more generally decide on some form of alternative representation for non-printable characters.

There are a few things wrong in your code:

parsing the output of ls
use a $ meant to be literal inside double quotes
Using awk for something that grep can do (not wrong per se, but overkill)
use xargs -0 when the input is not NUL-delimited
-I conflicts with -L 1. -L 1 is to run one command per line of input but with each word in the line passed as separate arguments, while -I @@ runs one command for each line of input with the full line (minus the trailing blanks, and quoting still processed) used to replace @@.
using {} inside the code argument of sh (command injection vulnerability)
In sh, the var in ${var%.*} is a variable name, it won't work with arbitrary text.
use echo for arbitrary data.

If you wanted to use xargs -0, you'd need something like:

printf '%s\0' * | grep -z '\.html$' | xargs -r0 sh -c '
  for file do
    printf "%s\n" "[${file%.*}](./$file)"
  done' sh > file.md

Replacing ls with printf '%s\0' * to get a NUL-delimited output
awk with grep -z (GNU extension) to process that NUL-delimited output
xargs -r0 (GNU extensions) without any -n/-L/-I, because while we're at spawning a sh, we might as well have it process as many files as possible
have xargs pass the words as extra arguments to sh (which become the positional parameters inside the inline code), not inside the code argument.
which means we can more easily store them in variables (here with for file do which loops over the positional parameters by default) so we can use the ${param%pattern} parameter expansion operator.
use printf instead of echo.

It goes without saying that it makes little sense to use that instead of doing that for loop directly over the *.html files like in the top example.

^{¹ It doesn't seem to work properly for multibyte characters in my version of ksh93 though (ksh93u+ on a GNU system)}

Related Solutions

Shell – ny way to use xargs across a pipe

If I understand correctly, you want to fire up one instance flac … | lame … for each input line, and interpolate the input into the arguments to both commands.

Since you need xargs to start a pipeline, you need to make it start a program that's capable of creating pipelines, i.e. a shell.

inotifywait -m -r -q -e moved_to --format "%w%f" ~/test |
xargs -l sh -c 'flac -cd "$0" - | lame -b 320 - "/media/1tb/$0.mp3"'

Alternatively, have the calling shell read lines one by one and run the pipeline.

inotifywait -m -r -q -e moved_to --format "%w%f" ~/test |
while IFS= read -r file; do
  flac -cd "$file" - | lame -b 320 - "/media/1tb/$file.mp3"
done

Note that the format %w%f produces an absolute path, to which you're prepending /media/1tb and appending .mp3. If you want to strip off the directory part of the file in the lame command, change $file to ${file##*/}. If you want to strip off the extension, change $file to ${file%.*}. If you want to do both, you'll have to do it in two steps. If you want to reproduce the directory hierarchy under /media/1tb, you can use mkdir -p.

cd ~/test
inotifywait -m -r -q -e moved_to --format "%w%f" . |
while IFS= read -r file; do
  [ -f "$file" ] || continue; # skip directories and other special files
  dir=${file%/*}; file=${file##*/}
  mkdir -p "/media/1tb/$dir"
  flac -cd "$dir/$file" - | lame -b 320 - "/media/1tb/$dir/${file#.*}.mp3"
done

Best Answer

Related Solutions

Shell – ny way to use xargs across a pipe

Related Question