Text Processing – Remove Extra Header Lines from File Except for the First Line

text processing

I have a file that looks like this toy example. My actual file has 4 million lines, about 10 of which I need to delete.

ID  Data1  Data2
1    100    100
2    100    200
3    200    100
ID  Data1  Data2
4    100    100
ID  Data1  Data2
5    200    200

I want to delete the lines that look like the header, except for the first line.

Final file:

ID  Data1  Data2
1    100    100
2    100    200
3    200    100
4    100    100
5    200    200

How can I do this?

Best Answer

header=$(head -n 1 input)
(printf "%s\n" "$header";
 grep -vFxe "$header" input
) > output
  1. grab the header line from the input file into a variable
  2. print the header
  3. process the file with grep to omit lines that match the header
  4. capture the output from the above two steps into the output file
Related Question