You may be able to do what you want by piping awk
's output into a while read
loop. For example:
awk '/^#/ {next}; NF == 0 {next}; NF != 4 {exit 1} ; {print}' |
while read NAME METHOD URL TAG ; do
: # do stuff with $NAME, $METHOD, $URL, $TAG
echo "$NAME:$METHOD:$URL:$TAG"
done
if [ "$PIPESTATUS" -eq 1 ] ; then
: # do something to handle awk's exit code
fi
Tested with:
$ cat input.txt
# comment
NAME METHOD URL TAG
a b c d
1 2 3 4
x y z
a b c d
$ ./testawk.sh <input.txt
NAME:METHOD:URL:TAG
a:b:c:d
1:2:3:4
Note that it correctly exits on the fifth x y z
input line.
$ cat tst.awk
{ cnt[$0]++ }
END {
n = sort(cnt,idxs)
for (i=1; i<=n; i++) {
idx = idxs[i]
printf "%s:%d%s", idx, cnt[idx], (i<n ? OFS : ORS)
}
}
function sort(arr, idxs, args, i, str, cmd) {
for (i in arr) {
gsub(/\047/, "\047\\\047\047", i)
str = str i ORS
}
cmd = "printf \047%s\047 \047" str "\047 |sort " args
i = 0
while ( (cmd | getline idx) > 0 ) {
idxs[++i] = idx
}
close(cmd)
return i
}
# create the 2 basic files to be parsed by the awk:
printf 'a b a a a c c d e s s s s e f s a e r r f\ng f r e d e z z c s d r\n' >fileA
printf 's f g r e d f g e z s d v f e z a d d g r f e a\ns d f e r\n'>fileB
for f in fileA fileB ; do
printf 'for file: %s: ' "$f"
tr ' ' '\n' < "$f" |
awk -f tst.awk
done
for file: fileA: a:5 b:1 c:3 d:3 e:5 f:3 g:1 r:4 s:6 z:2
for file: fileB: a:2 d:5 e:5 f:5 g:3 r:3 s:3 v:1 z:2
The above just builds a newline-separated string from the array indices (quoting it appropriately for sh
), creates a shell script that pipes that string to sort
, and then loops on the output. If you want to modify sort
s behavior just add a string of Unix sort
arguments to the sort
function call, e.g. sort(seen,"-fu")
. It could obviously be modified to print or do whatever else you want inside the sort()
function instead of populating an array of indices for you to loop on when it returns if that's what you prefer but then the function is as cohesive.
Note however that it will be limited to the maximum command line length on your system.
The \047
s in the code represent '
s which shell does not allow to be included in '
-delimited strings or scripts and so while we could use '
directly in an awk script being read from a file as I'm doing above, if you were to use that script on the command line as awk 'script' file
you'd need to use something instead of '
and \047
works both when the script is interpreted from the command line and from a file so it's the most portable choice of '
-replacement.
The '
s (\047
s) are present to quote str
in a way that ensures that the shell doesn't expand variables, have mismatched quotes, etc. when the string is being piped to sort, i.e. they do this:
$ echo 'foo'\''bar $(ls) $HOME' | awk '{
str=$0; gsub(/\047/, "\047\\\047\047", str); print "str="str
cmd="printf \047%s\047 \047" str "\047"; print "cmd="cmd
}'
str=foo'\''bar $(ls) $HOME
cmd=printf '%s' 'foo'\''bar $(ls) $HOME'
so we don't get something like this, which is vulnerable/buggy, instead:
$ echo 'foo'\''bar $(ls) $HOME' | awk '{
str=$0; print "str="str
cmd="printf \"%s\" \"" str "\""; print "cmd="cmd
}'
str=foo'bar $(ls) $HOME
cmd=printf "%s" "foo'bar $(ls) $HOME"
Best Answer
see How does awk '!a[$0]++' work?
basically use
a=!a
this will negatea
turning 0 to 1 and 1 to 0.you can test with