Delete the first n bytes of files

awkfilessedtext processing

I've got an extreme problem, and all of the solutions I can imagine are complicated. According to my UNIX/Linux experience there must be an easy way.

I want to delete the first 31 bytes of each file in /foo/. Each file is long enough. Well, I'm sure somebody will deliver me a suprisingly easy solution I just can't imagine. Maybe awk?

Best Answer

for file in /foo/*
do
  if [ -f "$file" ]
  then
    dd if="$file" of="$file.truncated" bs=31 skip=1 && mv "$file.truncated" "$file"
  fi
done

or the faster, thanks to Gilles' suggestion:

for file in /foo/*
    do
      if [ -f $file ]
      then
        tail +32c $file > $file.truncated && mv $file.truncated $file
      fi
    done

Note: Posix tail specify "-c +32" instead of "+32c" but Solaris default tail doesn't like it:

   $ /usr/bin/tail -c +32 /tmp/foo > /tmp/foo1
    tail: cannot open input

/usr/xpg4/bin/tail is fine with both syntaxes.

If you want to keep the original file permissions, replace

... && mv "$file.truncated" "$file"

... && cat "$file.truncated" "$file" && rm "$file.truncated"

Related Solutions

Bash – Replace multiple strings in a single pass

OK, a general solution. The following bash function requires 2k arguments; each pair consists of a placeholder and a replacement. It's up to you to quote the strings appropriately to pass them into the function. If the number of arguments is odd, an implicit empty argument will be added, which will effectively delete occurrences of the last placeholder.

Neither placeholders nor replacements may contain NUL characters, but you may use standard C \-escapes such as \0 if you need NULs (and consequently you are required to write \\ if you want a \).

It requires the standard build tools which should be present on a posix-like system (lex and cc).

replaceholder() {
  local dir=$(mktemp -d)
  ( cd "$dir"
    { printf %s\\n "%option 8bit noyywrap nounput" "%%"
      printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
      printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
    } | lex && cc lex.yy.c
  ) && "$dir"/a.out
  rm -fR "$dir"
}

We assume that \ is already escaped if necessary in the arguments but we need to escape double quotes, if present. That's what the second argument to the second printf does. Since the lex default action is ECHO, we don't need to worry about it.

Example run (with timings for the skeptical; it's just a cheap-o commodity laptop):

$ time echo AB | replaceholder A B B A
BA

real    0m0.128s
user    0m0.106s
sys     0m0.042s
$ time printf %s\\n AB{0000..9999} | replaceholder A B B A > /dev/null

real    0m0.118s
user    0m0.117s
sys     0m0.043s

For larger inputs it might be useful to provide an optimization flag to cc, and for current Posix compatibility, it would be better to use c99. An even more ambitious implementation might try to cache the generated executables instead of generating them each time, but they're not exactly expensive to generate.

Edit

If you have tcc, you can avoid the hassle of creating a temporary directory, and enjoy the faster compile time which will help on normal sized inputs:

treplaceholder () { 
  tcc -run <(
  {
    printf %s\\n "%option 8bit noyywrap nounput" "%%"
    printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
    printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
  } | lex -t)
}

$ time printf %s\\n AB{0000..9999} | treplaceholder A B B A > /dev/null

real    0m0.039s
user    0m0.041s
sys     0m0.031s

Awk or sed command to match regex at specific line, exit true if success, false otherwise

With GNU sed:

sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q}; $q1'

How it works

132 {/^#termcapinfo[[:space:]]*xterm Z0=/q}

On line 132, check for the regex ^#termcapinfo[[:space:]]*xterm Z0=. If found quit, q, with the default exit code of 0. The rest of the file is skipped.
$q1

If we reach the last line, $, then quit with exit code 1: q1.

Efficiency

Since it is not necessary to read past the 132nd line of the file, this version quits as soon as we reach the 132nd line or the end of the file, whichever occurs first:

sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q; q1}; $q1'

Handling empty files

The version above will return true for empty files. This is because, if the file empty, no commands are executed and the sed exits with the default exit code of 0. To avoid this:

! sed -n '132 {/^#termcapinfo[[:space:]]*xterm Z0=/q1; q}'

Here, the sed command exits with code 0 unless the the desired string is found in which case it exits with code 1 The preceding ! tells the shell to invert this code to get back to the code we want. The ! modifier is supported by all POSIX shells. This version will work even for empty files. (Hat tip: G-Man)