ASCII Pascal – Normal Text File Detected as Pascal Program

files

I have a "normal" looking text file (contains english sentences) which is getting detected by file command as ASCII Pascal program text.

How does Pascal program text differentiate itself from normal ASCII English text?

I did head -10 file > tmp

file tmp still shows Pascal.
tmp when opened in VI and :set list

HELEN'S BABIES$
$
With some account of their ways, innocent, crafty, angelic, impish,$
witching and impulsive; also a partial record of their actions during$
ten days of their existence$
$
By JOHN HABBERTON$
$
$
$

Output of head file | od -c

0000000   H   E   L   E   N   '   S       B   A   B   I   E   S  \n  \n
0000020   W   i   t   h       s   o   m   e       a   c   c   o   u   n
0000040   t       o   f       t   h   e   i   r       w   a   y   s   ,
0000060       i   n   n   o   c   e   n   t   ,       c   r   a   f   t
0000100   y   ,       a   n   g   e   l   i   c   ,       i   m   p   i
0000120   s   h   ,  \n   w   i   t   c   h   i   n   g       a   n   d
0000140       i   m   p   u   l   s   i   v   e   ;       a   l   s   o
0000160       a       p   a   r   t   i   a   l       r   e   c   o   r
0000200   d       o   f       t   h   e   i   r       a   c   t   i   o
0000220   n   s       d   u   r   i   n   g  \n   t   e   n       d   a
0000240   y   s       o   f       t   h   e   i   r       e   x   i   s
0000260   t   e   n   c   e  \n  \n   B   y       J   O   H   N       H
0000300   A   B   B   E   R   T   O   N  \n  \n  \n  \n
0000314

File uploaded here: http://www.fileswap.com/dl/L0eCWJTvy/

I'm on CentOS release 6.5, file version 5.04

There is something in the 4th line. Removing from 4th line onwards detects it as only text file

Best Answer

I was able reproduce this both on OS X 10.6.8 and OpenBSD 5.5-current.

Printing out debug information using file -D tmp, it turns out that your text file fails roughly 2000 tests before file(1) recognizes the Pascal keyword record and decides that it must be a Pascal program text.

A minimal working example can be obtained as follows:

$ echo record > test
$ file test
test: ASCII Pascal program text

After numerous heuristics, only the "third & last set of tests, based on hardwired assumptions" in ascmagic.c applies. These tests recognize "file types that we know based on keywords that can appear anywhere in the file". Therefore, minimal changes to your file result in the correct identification as ASCII English text, for example changing their to the in the third line.

Related Question