Linux – Diff: How to only compare the first n characters in each line

bashcommand linediff()linux

I have two log files that are being generated from a decoded binary data. The decoders are slightly different, and I am trying to isolate the differences in the output. To do this, I am diffing the two log files, which works pretty well except that the time stamps are different for each line. For certain reasons, the differences in the time stamps is not relevant, so I want diff to ignore them.

Because the log files follow a specific format, I can simply exclude the last ~40 characters from each line to ignore the time stamps. EX:

Line A:

[T9] | ENTRY NAME                       varA             = 0000012B  varB             = 00000000 | 000015.508.107.113s | file.cpp              :738

Line B:

[T9] | ENTRY NAME                       varA             = 0000012B  varB             = 00000000 | 000015.508.107.163s | file.cpp              :738

These lines should be treated as identical in my case.

How can I tell diff to only include the first n characters from each line, or exclude the last m characters from each line?

Best Answer

In bash, you can use process substitution.

To remove last 40 characters, you can use

diff <(sed 's/.\{40\}$//' file1) \
     <(sed 's/.\{40\}$//' file1)

To select first 40, you can use

cut -c1-40 file

Related Solutions

Sorting XML files so that differences can then be found

I think you can use a tool such as xmldiff for this purposes.

http://diffxml.sourceforge.net/

On the tools webpage it states:

The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).

Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.

Linux Diff Command – How to Diff Only the First Line of Two Files

Here you go:

diff <(head -n 1 file1) <(head -n 1 file2)

(this would return nothing what-so-ever).

diff <(head -n 2 file1) <(head -n 2 file2)

Returns:
2c2
< 1
---
> 3

You could incorporate that into a script to do the things you mention.

#!/bin/bash

fileOne=${1}
fileTwo=${2}
numLines=${3:-"1"}

diff <(head -n ${numLines} ${fileOne}) <(head -n ${numLines} ${fileTwo})

To use that, just make the script executable with chmod +x nameofscript.sh and then to execute, ./nameofscript.sh ~/file1 ~/Docs/file2 That leaves the default # of lines at 1, if you want more append a number to the end of that command.

(Or you could do switches in your script with -f1 file1 -f2 file2 -n 1, but I don't recall of the top of my head the case statement for that).

head returns from the beginning the # of lines as suggested by -n. If you were to want to do reverse, it would be tail -n ${numLines} (tail does from the end back the number of lines).

Edit 5/10/16:

This is specific to Bash (and compatible shells). If you need to use this from something else:

bash -c 'diff <(...) <(...)'

Best Answer

Related Solutions

Sorting XML files so that differences can then be found

Linux Diff Command – How to Diff Only the First Line of Two Files

Related Question