by the following command we can print the duplicate lines from file
uniq -d string file.txt
but how we can do it on csv file ?
we need to print the duplicate lines only on fields 1,2 from csv file – not include field 3
FS – ","
for example:
spark2-thrift-sparkconf,spark.history.fs.logDirectory,{{spark_history_dir}}
spark2-thrift-sparkconf,spark.history.fs.logDirectory,true
spark2-thrift-sparkconf,spark.history.Log.logDirectory,true
spark2-thrift-sparkconf,spark.history.DF.logDirectory,true
expected results:
spark2-thrift-sparkconf,spark.history.fs.logDirectory,{{spark_history_dir}}
spark2-thrift-sparkconf,spark.history.fs.logDirectory,true
second:
how exclude the duplicate lines from the csv file ( I mean to delete only the duplicate lines on fields 1,2
expected output:
spark2-thrift-sparkconf,spark.history.Log.logDirectory,true
spark2-thrift-sparkconf,spark.history.DF.logDirectory,true
Best Answer
Two file processing using same input file twice
NR==FNR{a[$1,$2]++; next}
using first two fields as key, save number of occurrencesa[$1,$2]>1
print only if count is greater than 1 during second passFor the opposite case, simple matter of changing condition check