CSV Duplicate Lines – Print Duplicate Lines Based on Fields 1 and 2

by the following command we can print the duplicate lines from file

uniq -d string file.txt

but how we can do it on csv file ?

we need to print the duplicate lines only on fields 1,2 from csv file – not include field 3

FS – ","

for example:

 spark2-thrift-sparkconf,spark.history.fs.logDirectory,{{spark_history_dir}}
 spark2-thrift-sparkconf,spark.history.fs.logDirectory,true
 spark2-thrift-sparkconf,spark.history.Log.logDirectory,true
 spark2-thrift-sparkconf,spark.history.DF.logDirectory,true

expected results:

 spark2-thrift-sparkconf,spark.history.fs.logDirectory,{{spark_history_dir}}
 spark2-thrift-sparkconf,spark.history.fs.logDirectory,true

second:

how exclude the duplicate lines from the csv file ( I mean to delete only the duplicate lines on fields 1,2

expected output:

 spark2-thrift-sparkconf,spark.history.Log.logDirectory,true
 spark2-thrift-sparkconf,spark.history.DF.logDirectory,true

$ awk -F, 'NR==FNR{a[$1,$2]++; next} a[$1,$2]>1' file.txt file.txt spark2-thrift-sparkconf,spark.history.fs.logDirectory,{{spark_history_dir}} spark2-thrift-sparkconf,spark.history.fs.logDirectory,true

CSV Duplicate Lines – Print Duplicate Lines Based on Fields 1 and 2

Best Answer

Related Question

Best Answer

Related Solutions

Bash – How to print lines with blank 5th field in CSV

How to remove duplicate lines in a CSV based on first field, and 1st n chars of 2nd field

Related Question