Ubuntu – Deleting duplicate lines in text file…

command linetext processing

How can I delete duplicate lines in a text file via command prompt?

For Example:
I have a 10MB text file and I want to keep only one line of My line, but somewhere in the text file there are 2 My lines.

Best Answer

Using awk

awk '!x[$0]++' infile.txt > outfile.txt

the way it works is that it keeps count of the lines in an array, and if the current count is zero, ie the first occurance, it prints the line, otherwise it continues to the next one.

Related Solutions

Ubuntu – Identify duplicate lines in a file without deleting them

If I understand your question, I think that you need something like:

for dup in $(sort -k1,1 -u file.txt | cut -d' ' -f1); do grep -n -- "$dup" file.txt; done

or:

for dup in $(cut -d " " -f1 file.txt | uniq -d); do grep -n -- "$dup" file.txt; done

where file.txt is your file containing data about you are interested.

In the output you will see the number of the lines and lines where first field is found two or more times.

Ubuntu – How to replace spaces with newlines/enter in a text-file

A few choices:

The classic, use tr:
```
tr ' ' '\n' < example
```

Use cut

cut -d ' ' --output-delimiter=$'\n' -f 1- example

Use sed
```
sed 's/ /\n/g' example
```
Use perl
```
perl -pe 's/ /\n/g' example
```

Use the shell

foo=$(cat example); echo -e ${foo// /\\n}

Best Answer

Related Solutions

Ubuntu – Identify duplicate lines in a file without deleting them

Ubuntu – How to replace spaces with newlines/enter in a text-file

Related Question