Shell – Remove all lines except D

shell-script

I have scenario where my three huge files Test.txt , Test1.txt and Test2.txt has following details.

H|||||||||||||||||||||||
D||||||||||||||||||||||||
D|||||||||||||||||||||||
H|||||||||||||||||||||
D||||||||||||||||||||||||
D||||||||||||||||||||||||
T||||||||||||||||||||||||

I have to delete all except D lines.
It should look like below in all my three files.(more than 10 GB)

D||||||||||||||||||||||||
D|||||||||||||||||||||||
D||||||||||||||||||||||||
D||||||||||||||||||||||||

So after retaining only D's lines in Test.txt, Test2.txt and Test3.txt,
I have to merge those into new file.

I have done the above operation using sed.

sed '/^\('D'\)|/!d' $Filename.txt >>  $NewFilename.txt

But because of huge file its taking very long time.

Can we do this operation using any other command in efficient way?

Best Answer

cat Test.txt Test2.txt Test3.txt | LC_ALL=C grep '^D' > newfile.txt

Or:

for file in Test.txt Test2.txt Test3.txt; do
  LC_ALL=C grep '^D' < "$file"
done > newfile.txt

Or if your grep like GNU grep supports the -h option (to avoid printing file names):

LC_ALL=C grep -h '^D' Test.txt Test2.txt Test3.txt > newfile.txt

By using LC_ALL=C we avoid grep trying to parse UTF-8 data. By using ^D, grep will only look at the first character of each line. grep, especially GNU grep is generally a lot faster than sed.

Related Question