I have a 2958616 byte text file. When I run sort < file.txt | uniq > sorted-file.txt
, I get a 3213965 byte text file. Why is my sorted text file bigger?
You can download the text files here.
text processing
I have a 2958616 byte text file. When I run sort < file.txt | uniq > sorted-file.txt
, I get a 3213965 byte text file. Why is my sorted text file bigger?
You can download the text files here.
Best Answer
While your original file has lines that end with
\n
, your sorted file has\r\n
. The addition of the\r
is what changes the size.To illustrate, here's what happens when I run your command on my Linux system:
As you can see, the sorted de-duped file is a few lines shorter and, consequently, a few bytes smaller. Your file, however, is different:
The two files have exactly the same number of lines, but:
The
sorted-file.txt
, the one I downloaded from your link, is larger. If we now examine the first line, we can see the extra\r
:Which aren't present in the one I created on Linux:
If we now remove the
\r
from your file:We get the expected result, a file that is smaller than the original, just like the one I created on my system: