The reason sed 's/[[:space:]]//g'
leaves a newline in the output is because the data is presented to sed
a line at a time. The substitution can therefore not replace newlines in the data (they are simply not part of the data that sed
sees).
Instead, you may use tr
tr -d '[:space:]'
which will remove space characters,
form feeds,
new-lines,
carriage returns,
horizontal tabs,
and vertical tabs.
I think you're looking for expand
and/or unexpand
. It seems you're trying to ensure a \t
ab width counts as 8 chars rather than the single one. fold
will do that as well, but it will wrap its input to the next line rather than truncating it. I think you want:
expand < input | cut -c -80
expand
and unexpand
are both POSIX specified:
- The
expand
utility shall write files or the standard input to the standard output with \t
ab characters replaced with one or more space characters needed to pad to the next tab stop. Any backspace characters shall be copied to the output and cause the column position count for tab stop calculations to be decremented; the column position count shall not be decremented below zero.
Pretty simple. So, here's a look at what this does:
unset c i; set --;
until [ "$((i+=1))" -gt 10 ]; do set -- "$@" "$i" "$i"; done
for c in 'tr \\t \ ' expand; do eval '
{ printf "%*s\t" "$@"; echo; } |
tee /dev/fd/2 |'"$c"'| {
tee /dev/fd/3 | wc -c >&2; } 3>&1 |
tee /dev/fd/2 | cut -c -80'
done
The until
loop at top gets a set of data like...
1 1 2 2 3 3 ...
It printf
s this with the %*s
arg padding flag so for each of those in the set printf
will pad with as many spaces as are in the number of the argument. To each one it appends a \t
ab character.
All of the tee
s are used to show the effects of each filter as it is applied.
And the effects are these:
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
66
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8
105
Those rows are lined up in two sets like...
- output of
printf ...; echo
- output of
tr ...
or expand
- output of
cut
- output of
wc
The top four rows are the results of the tr
filter - in which each \t
ab is converted to a single space.
And the bottom four the results of the expand
chain.
Best Answer
Under Linux,
cat -T
shows tabs as^I
. There are other options to make trailing whitespace apparent, to display control characters in a printable form, etc.If you want to compare the result of your program with the original, you can use
diff
:You may also want to compare the input of your program with the standard utility
expand
.If you want exactly the transformation of spaces into
\s
and tabs into\t
, you can use sed:(The first expression doubles backslashes, which makes the transformation unambiguous.)