Bash – Reading Character by Character with Read

bashreadline

I've been trying to use bash to read a file character by character.

After much trial and error, I have discovered that this works:

exec 4<file.txt 
declare -i n
while read -r ch <&4; 
     n=0
     while [ ! $n -eq ${#ch} ]
           do  echo -n "${ch:$n:1}"
               (( n++ ))
          done
     echo "" 
     done

I.e., I can read it line by line and then loop through each line char by char.

Before doing this, I had tried:
exec 4<file.txt && while read -r -n1 ch <&4; do; echo -n "$ch"; done
but it would skip all whitespaces in the file.

Could you please explain why? Is there a way to make the second strategy (i.e. reading char by char with bash's read) work?

Best Answer

You need to remove whitespace characters from the $IFS parameter for read to stop skipping leading and trailing ones (with -n1, the whitespace character if any would be both leading and trailing, so skipped):

while IFS= read -rn1 a; do printf %s "$a"; done

But even then bash's read will skip newline characters, which you can work around with:

while IFS= read -rn1 a; do printf %s "${a:-$'\n'}"; done

Though you could use IFS= read -d '' -rn1 instead or even better IFS= read -N1 (added in 4.1, copied from ksh93 (added in o)) which is the command to read one character.

Note that bash's read can't cope with NUL characters. And ksh93 has the same issues as bash.

With zsh:

while read -ku0 a; do print -rn -- "$a"; done

(zsh can cope with NUL characters).

Note that those read -k/n/N read a number of characters, not bytes. So for multibyte characters, they may have to read multiple bytes until a full character is read. If the input contains invalid characters, you may end up with a variable that contains a sequence of bytes that doesn't form valid characters and which the shell may end up counting as several characters. For instance in a UTF-8 locale:

$ printf '\375\200\200\200\200ABC' | bash -c '
    IFS= read  -rN1 a; echo "${#a}"'
6

That \375 would introduce a 6-byte UTF-8 character. However, the 6th one (A) above is invalid for a UTF-8 character. You still end-up with \375\200\200\200\200A in $a, which bash counts as 6 characters though the first 5 ones are not really characters, just 5 bytes not forming part of any character.

Related Solutions

Bash – How to Read from Two Input Files Using While Loop

If you can guarantee that some character will never occur in the first file then you can use paste.

For example you know for sure that @ will never occur:

paste -d@ file1 file2 | while IFS="@" read -r f1 f2
do
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done

Note that it is enough if the character is guaranteed to not occur in the first file. This is because read will ignore IFS when filling the last variable. So even if @ occurs in the second file it will not be split.

Example using some bash features for arguably cleaner code and paste using default delimiter tab:

while IFS=$'\t' read -r f1 f2
do
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done < <(paste file1 file2)

Bash features used: ansi c string ($'\t') and process substitution (<(...)) to avoid the while loop in a subshell problem.

If you cannot be certain that any character will never occur in both files then you can use two file descriptors.

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  printf 'f1: %s\n' "$f1"
  printf 'f2: %s\n' "$f2"
done 3<file1 4<file2

Not tested much. Might break on empty lines.

File descriptors number 0, 1, and 2 are already used for stdin, stdout, and stderr, respectively. File descriptors from 3 and up are (usually) free. The bash manual warns from using file descriptors greater than 9, because they are "used internally".

Note that open file descriptors are inherited to shell functions and external programs. Functions and programs inheriting an open file descriptor can read from (and write to) the file descriptor. You should take care to close all file descriptors which are not required before calling a function or external program.

Here is the same program as above with the actual work (the printing) separated from the meta-work (reading line by line from two files in parallel).

work() {
  printf 'f1: %s\n' "$1"
  printf 'f2: %s\n' "$2"
}

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  work "$f1" "$f2"
done 3<file1 4<file2

Now we pretend that we have no control over the work code and that code, for whatever reason, tries to read from file descriptor 3.

unknowncode() {
  printf 'f1: %s\n' "$1"
  printf 'f2: %s\n' "$2"
  read -r yoink <&3 && printf 'yoink: %s\n' "$yoink"
}

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  unknowncode "$f1" "$f2"
done 3<file1 4<file2

Here is an example output. Note that the second line from the first file is "stolen" from the loop.

f1: file1 line1
f2: file2 line1
yoink: file1 line2
f1: file1 line3
f2: file2 line2

Here is how you should close the file descriptors before calling external code (or any code for that matter).

while true
do
  read -r f1 <&3 || break
  read -r f2 <&4 || break
  # this will close fd3 and fd4 before executing anycode
  anycode "$f1" "$f2" 3<&- 4<&-
  # note that fd3 and fd4 are still open in the loop
done 3<file1 4<file2

Bash – How to Name a File in the Deepest Level of a Directory Tree

That's an odd request!

I'd use find + awk to grab a file in the deepest directory:

bash-3.2$ deepest=$(find / -type f | awk -F'/' 'NF > depth {
>     depth = NF;
>     deepest = $0;
> }
>
> END {
>     print deepest;
> }')

Using ${deepest} in your mv command is left as an exercise but the following five lines may help you further:

bash-3.2$ echo "${deepest}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina/Buenos_Aires.rb

bash-3.2$ echo "${deepest%.*}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina/Buenos_Aires

bash-3.2$ echo "${deepest%/*}"
/Developer/SDKs/MacOSX10.6.sdk/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8/gems/activesupport-2.3.5/lib/active_support/vendor/tzinfo-0.3.12/tzinfo/definitions/America/Argentina

bash-3.2$ echo "${deepest##*/}"
Buenos_Aires.rb

bash-3.2$ echo "${deepest##*.}"
rb

Following update to question:

find -type d [...] "This would only find the directory. [...] How could this be solved in the most simple way?".

By supplying -type f to find to find all files (f), not all directories (d).

Best Answer

Related Solutions

Bash – How to Read from Two Input Files Using While Loop

Bash – How to Name a File in the Deepest Level of a Directory Tree

Following update to question:

Related Question