Shell – does head input > output copy all invisible characters to the new file

character encodingheadshelltext processing

I need to grab the first lines of a long text file for some bugfixing on a smaller file (a Python script does not digest the large text file as intended). However, for the bugfixing to make any sense, I really need the lines to be perfect copies, basically byte-by-byte, and pick up any potential problems with character encoding, end-of-line characters, invisible characters or what not in the original txt. Will the following simple solution accomplish that or I'd lose something using the output of head?

head infile.txt > output.txt

A more general question on the binary copy with head, dd, or else is now posted here.

Best Answer

POSIX says that the input to head is a text file, and defines a text file:

3.397 Text File

A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

So there is a possibility of losing information.

Related Question