Text Processing with AWK – Filter Lines Based on Specific Column Words

awktext processing

Having input.csv as following:

field_name,field_friendly_name
LastNm,Last_Name
cntn_last_mod_wrkr_full_nm,Last_Name
contact_last_nm,Last_Name
contact_first_last_nm,Last_Name
last_english_nm,Last_Name
last_pronunciation_nm,Last_Name
last_nm,Last_Name
lead_space_last_nm,Last_Name
last_mod_usr_nm,Last_Name
lcl_last_nm,Last_Name
adobe_last_topic_nm,Last_Name
last_changed_user_nm,Last_Name
last_purchased_product_service_nm,Last_Name
last_imported_source_nm,Last_Name
submt_last_nm,Last_Name
cntct_last_nm,Last_Name
cust_submt_last_nm,Last_Name
cust_cntct_last_nm,Last_Name
last_mod_by_nm,Last_Name
last_mod_als_nm,Last_Name
last_mod_nm,Last_Name
ship_last_nm,Last_Name
billing_last_nm,Last_Name
last_upd_by_nm,Last_Name
wrkr_last_nm,Last_Name
trns_line_itm_last_chg_psn_nm,Last_Name
trns_line_itm_last_cre_psn_nm,Last_Name
trns_hdr_last_chg_psn_nm,Last_Name
altr_last_nm,Last_Name
trns_last_chg_nm,Last_Name
lastrepaction_nm,Last_Name
last_build_nm,Last_Name
LegalLastNm,Last_Name
ManagerLastNm,Last_Name
4-LastNm,Last_Name
NextLevelManagerLastNm,Last_Name
ManagerLegalLastNm,Last_Name

from this file I would like to filter on column1 where condition is
column1 value should be made of given set of words in this case (last, name, nm, lst, -, _, [0-9] ) and exclude if contains any other words.
And also update column2 as "Found".
And my search should be case insensitive.

LastNm,Found
last_nm,Found
4-LastNm,Found

I'm using this way wchich doesn't work:

awk -v q="'" --field-separator ',' '((tolower($1) ~ /last/) && (tolower($1) ~ /name/)) || ((tolower($1) ~ /last/) && (tolower($1) ~ /nm/)) && ($2="found") {print $1 "," $2  }' raw.csv

Best Answer

With GNU awk, gensub could be used to remove all those words, print if empty:

awk -F , -v OFS=, 'gensub(/last|lst|name|nm|[0-9_-]*/,"","g",tolower($1))=="" {
    $2="found";
    print $1, $2
}' file

Unlike sub/gsub, gensub leaves the original record intact and instead returns the resulting string. The same approach could be used with standard awk by copying field into a variable.

To include more characters than [0-9_-], you could use [^[:alpha:]] (i.e. anything that isn't a letter):

last|lst|name|nm|[^[:alpha:]]

Explanation

-v n=2 defines the field number to copy when the pattern is found.
/^name/ {a=$(n); print; next} if the line starts with the given pattern, store the given field and print the line.
{print a, $0} otherwise, print the current line with the stored value first.

You can generalize the pattern part into something like:

awk -v n=2 -v pat="name" '$1==pat {a=$(n); print; next} {print a, $0}' file

awk Regular Expression – How to Perform Case-Insensitive Search in awk

Replace your expression to match a pattern (i.e. /&&&key word&&&/) by another expression explicitly using $0, the current line:

tolower($0) ~ /&&&key word&&&/

toupper($0) ~ /&&&KEY WORD&&&/

so you have

awk 'tolower($0) ~ /&&&key word&&&/ { p = ! p ; next }; p' text.txt

You need single quotes because of the $0, the BEGIN block can be removed as variables are initialised by default to "" or 0 on first use, and {print} is the default action, as mentioned in the comments below.

Best Answer

Related Solutions

Patterns and file processing

Explanation

awk Regular Expression – How to Perform Case-Insensitive Search in awk

Related Question