Troubleshooting Text File Marked as Binary – Linux, Grep

greplinux

I have an executable that generates a text file as its output. The problem is that the text file comes out with a binary file flag of some sort. The result is something like this:

$ grep "grep string" output_file.txt
Binary file output_file.txt matches.

$ grep -a "grep string" output_file.txt
[correct results]

Some reading has indicated that grep looks for a null character in the first thousand or so bytes, then determines from that whether or not a file is 'binary', so my question is two-fold:

Is there an easy way to strip null characters from my files (I can do this as part of my post-processing) to ensure that grep works correctly without the -a flag?
Is there something obvious I should look for in my code to prevent null characters from being written to the file? I've looked through the code quite thoroughly and I don't see any obvious culprits.

.

Best Answer

I can answer at least the first question. If you're using Unix/Linux you can use tr

tr -d '\000' < filein > fileout

where \000 is the null char. You can also strip all non-printable chars as you can see on the example here: "Unix Text Editing: sed, tr, cut, od, awk"

Regarding your second question, I don't know which is your programming language but I'd search for uninitialized variables which could be end being printed to the output file.

Related Solutions

Linux – Using sed to replace string with special characters in XML file

Putting shell variables in single quotes disables their interpretation. That's why your command has no effect.

$ echo  's/"$OLD_STRING"/"$NEW_STRING"/g'
s/"$OLD_STRING"/"$NEW_STRING"/g

It should be written like that:

sed -i "s/'$OLD_STRING'/'$NEW_STRING'/g" jboss-beans.xml

But then the variables are interpreted before calling sed and the again contain special characters:

$ echo  "s/'$OLD_STRING'/'$NEW_STRING'/g"
s/'<property name="webServiceHost">${jboss.bind.address}</property>'/'<!--<property name="webServiceHost">${jboss.bind.address}</property>-->'/g

For that reason sed has this special featur allowing to define the s/// command delimiters by simply using them, e.g.:

sed -i "s#'$OLD_STRING'#'$NEW_STRING'#g" jboss-beans.xml

Still your search expression contains special regexp characters, and using sed like this is just waste of its abilities. I would write the expression like this:

sed -i 's/\(<.*webServiceHost.*jboss.bind.address.*>\)/<!--\1-->/' jboss-beans.xml

Of course you can make the match string more or less specific according to your needs. There is also other nice feature that can help. sed allows to narrow editing operations to the lines matching a specific pattern. Your command could look like this:

sed -i '/webServiceHost/ s/^\(.*\)$/<!--\1-->/' jboss-beans.xml

Linux – What are the exact reasons `grep` on /proc and raw disks is a bad idea

Yes, you can grep /dev/sda1 and /proc but you probably don't want to. In more detail:

Yes, you can run grep the binary contents of /dev/sda1. But, with modern large hard disks, this will take a very long time and the result is not likely to be useful.
Yes, you can grep the contents of /proc but be aware that your computer's memory is mapped in there as files. On a modern computer with gigabytes of RAM, this will take a long time to grep and, again, the result is not likely to be useful.

As an exception, if you are looking for data on a hard disk with a damaged file system, you might run grep something /dev/sda1 as part of an attempt to recover the file's data.

Infinite loops

"...links ... create infinite loops when traversed..."

Grep (at least the GNU version) is smart enough not to do that. Let's consider two cases:

With the -r option, grep does not follow symbolic links unless they are explicitly specified on the command line. Hence, infinite loops are not possible.
With the -R option, grep does follow symbolic links but it checks them and refuses to get caught in a loop. To illustrate:
```
$ mkdir a
$ ln -s ../ a/b
$ grep -R something .
grep: warning: ./a/b: recursive directory loop
```

Excluding problematic directories from `grep -r`

As an aside, grep provides a limited facility to stop grep from searching certain files or directories. For example, you can exclude all directories named proc, sys, and dev from grep's recursive search with:

grep --exclude-dir proc --exclude-dir sys --exclude-dir dev -r something /

Alternatively, we can exclude proc, sys, and dev using bash's extended globs:

shopt -s extglob
grep -r something /!(proc|sys|dev)

Best Answer

Related Solutions

Linux – Using sed to replace string with special characters in XML file

Linux – What are the exact reasons `grep` on /proc and raw disks is a bad idea

Other problematic files in /dev

Infinite loops

Excluding problematic directories from grep -r

Related Question

Other problematic files in `/dev`

Excluding problematic directories from `grep -r`