In general, you can't. The metadata guaranteed to be stored is always that of the latest revision, and any other metadata could be overwritten at any moment.
If your environment is potentially hostile, consider using an the kernel audit subsystem to audit and log the rename()
and write()
syscalls. This is fairly unwieldy, however, because you will log extreme volumes of data that you probably don't care about. You could also limit your auditing to a subset of files which you care about, if you like.
If this is mostly for revision, consider using a version control system, like Git. This allows users to keep tabs on file states effectively through time, and is much more user friendly than navigating backwards through an audit log. It can do all the things you asked for, and much more.
I found this Perl script, parse-audit-log.pl, that shows a function that can parse that string as follows:
sub parse_saddr
{
my $sockfd = $_[0];
my $saddr = $_[1];
# 0 - sys_bind(), 1 - sys_connect(), 2 - sys_accept()
my $action = $_[2];
($f1, $f2, $p1, $p2, @addr) = unpack("A2A2A2A2A2A2A2A2", $saddr);
$family = hex2dec($f1) + 256 * hex2dec($f2);
$port = 256 * hex2dec($p1) + hex2dec($p2);
$ip1 = hex2dec($addr[0]);
$ip2 = hex2dec($addr[1]);
$ip3 = hex2dec($addr[2]);
$ip4 = hex2dec($addr[3]);
#print "$saddr\n";
if ($family eq 2) { #&& $ip1 ne 0) {
my $dst_addr = "$ip1.$ip2.$ip3.$ip4:$port";
# print "family=$family $dst_addr\n\n";
# todo: avoid code duplication
if ($action eq 0) {
$sockfd_hash{ $sockfd } = $dst_addr;
} elsif ($action eq 1) {
my $src_addr;
if (exists $sockfd_hash{ $sockfd }) {
$src_addr = $sockfd_hash{ $sockfd };
} else {
$src_addr = "x.x.x.x:x";
}
print "$src_addr -> $dst_addr\n";
} elsif ($action eq 2) {
my $src_addr;
if (exists $sockfd_hash{ $sockfd }) {
$src_addr = $sockfd_hash{ $sockfd };
} else {
$src_addr = "x.x.x.x:x";
}
print "$dst_addr <- $src_addr\n";
} else {
print "unknown action\n";
}
} elsif ($family eq 1) {
$tmp1 = 0;
($tmp1, $tmp2) = unpack("A4A*", $saddr);
my $file = pack("H*", $tmp2);
# print "family=$family file=$file\n";
} else {
# print "$saddr\n";
}
}
This script was part of this TWiki page on the CERN website, under LinuxSupport. The page titled: IDSNetConnectionLogger contains 2 files of interest. One the script I mentioned above, parse-audit-log.pl, and the other is a sample audit.log file.
Running the script
If you download those 2 files you'll notice this is what you're asking about.
Examples
$ ./parse-audit-log.pl -l audit.log
x.x.x.x:x -> 0.0.0.0:22
x.x.x.x:x -> 137.138.32.52:22
137.138.32.52:22 <- x.x.x.x:x
x.x.x.x:x -> 0.0.0.0:22
x.x.x.x:x -> 137.138.32.52:0
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.128.158:88
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.128.148:750
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.128.158:88
x.x.x.x:x -> 137.138.32.52:0
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.16.5:53
x.x.x.x:x -> 137.138.128.158:88
x.x.x.x:x -> 127.0.0.1:6010
Pulling out the parser logic
We can kind of condense the above so that it's a saddr
parser only. Here's my stripped down version.
$ cat parse_saddr.pl
#!/usr/bin/perl -w
# Getopt::Std module from the perl package
use Getopt::Std;
my %Options;
getopt('s', \%Options);
if (defined($Options{'s'})) {
$saddr = $Options{'s'};
} else {
print "saddr not given\n";
exit(-1);
}
sub hex2dec($) { return hex $_[0] }
sub parse_saddr
{
my $saddr = $_[0];
($f1, $f2, $p1, $p2, @addr) = unpack("A2A2A2A2A2A2A2A2", $saddr);
$family = hex2dec($f1) + 256 * hex2dec($f2);
$port = 256 * hex2dec($p1) + hex2dec($p2);
$ip1 = hex2dec($addr[0]);
$ip2 = hex2dec($addr[1]);
$ip3 = hex2dec($addr[2]);
$ip4 = hex2dec($addr[3]);
#print "$saddr\n";
if ($family eq 2) { #&& $ip1 ne 0) {
my $dst_addr = "$ip1.$ip2.$ip3.$ip4:$port";
print "family=$family $dst_addr\n\n";
} elsif ($family eq 1) {
$tmp1 = 0;
($tmp1, $tmp2) = unpack("A4A*", $saddr);
my $file = pack("H*", $tmp2);
print "family=$family file=$file\n";
} else {
print "$saddr\n";
}
}
&parse_saddr($saddr);
Sample run of saddr parser scrip
We can run it like so:
$ ./parse_saddr.pl -s 02000035898A1005000000000000000030BED20858D83A0010000000
family=2 137.138.16.5:53
You could then use a command like this to parse all the saddr=..
lines from the audit.log
file mentioned above:
$ for i in $(grep saddr audit.log | cut -d"=" -f4);do echo $i; \
./parse_saddr.pl -s $i;done | less
The above is hacked together so it doesn't handle the family=1 types of saddr
. You'd have to dig in more, but this gives you a rough start as to how to deal with all this.
Example output
$ for i in $(grep saddr audit.log | cut -d"=" -f4);do echo $i; \
./parse_saddr.pl -s $i;done | less
...
01002F6465762F6C6F67000000000000
family=1 file=/dev/log^@^@^@^@^@^@
...
02000035898A10050000000000000000726E2E6368009A0900000000
family=2 137.138.16.5:53
...
02000058898A809E0000000000000000
family=2 137.138.128.158:88
...
020002EE898A80940000000000000000
family=2 137.138.128.148:750
...
0200177A7F0000010000000000000000
family=2 127.0.0.1:6010
...
Perl's pack/unpack functions
These are very powerful functions once you understand how they work. If you've never used them before then I'd take a look at the tutorial, perlpacktut.
The idea behind these functions is that they take data in and use a template to return that data using the the template as a structure of how the data should be organized.
Again here's a simple Perl script that shows the unpacking of the saddr
.
$ cat unpack.pl
#!/usr/bin/perl
$saddr = "02000035898A1005000000000000000030BED20858D83A0010000000";
($f1, $f2, $p1, $p2, @addr) = unpack("A2A2A2A2A2A2A2A2", $saddr);
printf "org string: $saddr\n";
printf "org values==> f1: %s f2: %s p1: %s p2: %s addr: %s\n",
$f1,$f2,$p1,$p2,join("",@addr);
printf "new values==> f1: %2s f2: %2s p1: %2s p2: %2s addr: %s.%s.%s.%s\n\n",
hex($f1),hex($f2),hex($p1),hex($p2),hex($addr[0]),hex($addr[1]),hex($addr[2]),hex($addr[3]);
Which produces this:
$ ./unpack.pl
org string: 02000035898A1005000000000000000030BED20858D83A0010000000
org values==> f1: 02 f2: 00 p1: 00 p2: 35 addr: 898A1005
new values==> f1: 2 f2: 0 p1: 0 p2: 53 addr: 137.138.16.5
Here we're taking the data that's contained in $saddr
and calling unpack()
telling the function to take the data 2 bytes at a time (A2). Do this 10 times. The first 4 A2
blocks, which are really just 2 characters apiece, are stored in the variables: $f1
, $f2
, $p1
, $p2
. The remaining characters are stored in the array @addr
.
Best Answer
Might not be the solution you are looking for, but having run into a similar scenario in the past, my go-to is to write a python parser to take the trace file, and output only the data I care about into a resulting file. As long as the patterns of the "garbage text" that you don't care about are predictable/differentiable from the xtrace output, the parsing should be extremely easy to do.
Example - grabbing a trace of all IOs across all submission queues for 20 min test consisting of reads and writes results in a 40GB-70GB file, but after parsing the data I care about out of it, my result file is only ~1-3GB in size, and actually usable for analysis and visualization. Granted, with such large file sizes, it takes a server-class system ~15-30 min to process through it all, but doing the same process (separate scenario, same idea - multiple logs each only about 24kb) - I can get all of the data from all logs into a single consumable file in under 2 seconds.
All that to say, it might be worth it to add a post-processing step that might add a few seconds/minutes to the overall capture, rather than waiting for a one-stop-shop.
Best of luck!