Bash – Does Bash or AWK have IN operator like R programming Language

awkbashrshell

In R, We have IN operator to check whether or not the element is present in the specific column.

For example: If we have fruits and market dataframe with fruit_name and products as the column name respectively. And, say, we have to check what fruits are present in the market.

In R,

available_fruit <- fruits$fruit_name %in% market$products

Is there any operator in bash or AWK which does similar action like %in% in R?

Best Answer

awk has an in operator. It may be used to access the indexes in an array (arrays are associative arrays/hashes in awk).

If the names of the fruits are keys in the array market then you may use

if (fruit_name in market) { ... }

to check whether the string in fruit_name is a key in market.

For example

BEGIN { FS = "\t" }

NR == FNR { market[$1] = $2; next }

!($1 in market) { printf("No %s in the market\n", $1 ); next }

{ sum += market[$1] }

END { printf("Total sum is %.2f\n", sum ) }

Running this on two files:

$ awk -f script.awk market_prices mylist

where market_prices is a two-column tab delimited file with items and prices, and mylist is a list of items. The script would read the items and their prices from the first file and populate market with these, and then calculate the total cost of the items in the second file, if they existed in the market, reporting the items that can't be found.

The in operator may also be used to loop over the indexes of an array:

for (i in array) {
    print i, array[i]
}

The ordering of the indexes may not be sorted.

Related Solutions

Shell – How to add column in the beginning of file using perl

Here's your perl one-liner: it works with multiple file arguments

perl -i -pe '/^$ARGV,/ or print "$ARGV,"' file1 file2 ...

$ARGV is the magic variable that holds the filename of the current file.
See http://perldoc.perl.org/perlvar.html#Variables-related-to-filehandles

The field separator (comma) is hardcoded. You can decide if that's a problem.

Small performance improvement:

perl -i -pe 'index($_, "$ARGV,") == 0 or print "$ARGV,"' file1 file2 ...

Escape sequences needed when using tilde ~ operator in awk

The ~ operator does pattern matching, treating the right hand operand as an (extended) regular expression, and the left hand one as a string. POSIX says:

A regular expression can be matched against a specific field or string by using one of the two regular expression matching operators, '~' and "!~". These operators shall interpret their right-hand operand as a regular expression and their left-hand operand as a string.

So ENVIRON["patt"] is treated as a regular expression, and needs to have all characters that are special in EREs to be escaped, if you don't want them to be have their regular ERE meanings.

Note that it's not about using $0 or ENVIRON["name"], but the left and right sides of the tilde. This would take the input lines (in $0) as the regular expression to match against:

str=foobar awk 'ENVIRON["str"] ~ $0 { 
     printf "pattern /%s/ matches string \"%s\"\n", $0, ENVIRON["str"] }'

Best Answer

Related Solutions

Shell – How to add column in the beginning of file using perl

Escape sequences needed when using tilde ~ operator in awk

Related Question