POSIX defines a text file as:
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Source: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403
However, there are several things I find unclear:
-
Must a text file be a regular file? In the above excerpt it does not explicitly say the file must be a regular file
-
Can a file be considered a text file if contains one character and one character only (i.e., a single character that isn't terminated with a newline)? I know this question may sound nitpicky, but they use the word "characters" instead of "one or more characters". Others may disagree, but if they mean "one or more characters" I think they should explicitly say it
-
In the above excerpt, it makes reference to "lines". I found four definitions with line in their name: "Empty Line", "Display Line", "Incomplete Line" and "Line". Am I supposed to infer that they mean "Line" because of their omission of "Empty", "Display" and "Incomplete"- or are all four of these definitions inclusive as being considered a line in the excerpt above?
All questions that come after this block of text depend on inferring that "characters" means "one or more characters":
- Can I safely infer that if a file is empty, it is not a text file because it does not contain one or more characters?
All questions that come after this block of text depend on inferring that in the above excerpt, a line is defined as a "Line", and that the other three definitions containing "Line" in their name should be excluded:
-
Does the "zero" in "zero or more lines" mean that a file can still be considered a text file if it contains one or more characters that are not terminated with newline?
-
Does "zero or more lines" mean that once a single "Line" (0 or more characters plus a terminating newline) comes into play, that it becomes illegal for the last line to be an "Incomplete Line" (one or more non-newline characters at the end of a file)?
-
Does "none [no line] can exceed {LINE_MAX} bytes in length, including the newline character" mean that there a limitation to the number of characters allowed in any given "Line" in a text file (as an aside, the value of LINE_MAX on Ubuntu 18.04 and FreeBSD 11.1 is "2048")?
Best Answer
No; the excerpt even specifically notes standard input as a potential text file. Other standard utilities, such as
make
, specifically use the character special file/dev/null
as a text file.That character must be a <newline>, or this isn't a line, and so the file it's in isn't a text file. A file containing exactly byte 0A is a single-line text file. An empty line is a valid line.
It's not really an inference, it's just what it says. The word "line" has been given a contextually-appropriate definition and so that's what it's talking about.
An empty file consists of zero (or more) lines and is thus a text file.
No, these characters are not organised into lines.
It's not illegal, it's just not a text file. A utility requiring a text file to be given to it may behave adversely if given that file instead.
Yes.
This definition is just trying to set some bounds on what a text-based utility (for example,
grep
) will definitely accept — nothing more. They are also free to accept things more liberally, and quite often they do in practice. They are permitted to use a fixed-size buffer to process a line, to assume a newline appears before it's full, and so on. You may be reading too much into things.