I have a "normal" looking text file (contains english sentences) which is getting detected by file
command as ASCII Pascal program text
.
How does Pascal program text differentiate itself from normal ASCII English text?
I did head -10 file > tmp
file tmp
still shows Pascal
.
tmp
when opened in VI and :set list
HELEN'S BABIES$
$
With some account of their ways, innocent, crafty, angelic, impish,$
witching and impulsive; also a partial record of their actions during$
ten days of their existence$
$
By JOHN HABBERTON$
$
$
$
Output of head file | od -c
0000000 H E L E N ' S B A B I E S \n \n
0000020 W i t h s o m e a c c o u n
0000040 t o f t h e i r w a y s ,
0000060 i n n o c e n t , c r a f t
0000100 y , a n g e l i c , i m p i
0000120 s h , \n w i t c h i n g a n d
0000140 i m p u l s i v e ; a l s o
0000160 a p a r t i a l r e c o r
0000200 d o f t h e i r a c t i o
0000220 n s d u r i n g \n t e n d a
0000240 y s o f t h e i r e x i s
0000260 t e n c e \n \n B y J O H N H
0000300 A B B E R T O N \n \n \n \n
0000314
File uploaded here: http://www.fileswap.com/dl/L0eCWJTvy/
I'm on CentOS release 6.5
, file
version 5.04
There is something in the 4th line. Removing from 4th line onwards detects it as only text file
Best Answer
I was able reproduce this both on OS X 10.6.8 and OpenBSD 5.5-current.
Printing out debug information using
file -D tmp
, it turns out that your text file fails roughly 2000 tests beforefile(1)
recognizes the Pascal keywordrecord
and decides that it must be a Pascal program text.A minimal working example can be obtained as follows:
After numerous heuristics, only the "third & last set of tests, based on hardwired assumptions" in ascmagic.c applies. These tests recognize "file types that we know based on keywords that can appear anywhere in the file". Therefore, minimal changes to your file result in the correct identification as
ASCII English text
, for example changingtheir
tothe
in the third line.