A good way to sort two log files as they are created

logssort

I have two scripts running, both of which output a log file. I'd like to make a third script that can sort these logs by time stamp and merge them into one file as they are created. What is a good way to do this, ideally without overwriting the file constantly?

Best Answer

If you use tail -f to tail 2 or more files, then the command shows the data line-by-line, and outputs the filename each time the source of data changes. Using this you can write a script to merge the interleaved output from tail according to timestamp by holding onto each line until you see a line from the other file with a later timestamp.

For example, using two standard logfiles (/var/log/messages and /var/log/cron) which on my system have the same format for the timestamp at the start of the line (eg Jun 9 02:55:01), you can do the following:

tail -f /var/log/messages /var/log/cron |
awk '
BEGIN { num[0] = 0; num[1] = 0; }
/^==> /{
  file = $2; aa = file~/messages/?0:1; bb = 1-aa; 
  aanum = num[aa]; bbnum = num[bb];
  next }
/^$/{ next }
{ "date --date \"" $1 " " $2 " " $3 "\" +%s" | getline date
  lines[aa,aanum] = $0
  dates[aa,aanum++] = date
  maxes[aa] = date
  minmax = maxes[aa]
  if(maxes[bb]<minmax)minmax = maxes[bb]

  i = 0; j = 0;
  while(1){
    aaok = (i<aanum && dates[aa,i]<=minmax)
    bbok = (j<bbnum && dates[bb,j]<=minmax)
    if(aaok && bbok){
      if(dates[aa,i]<=dates[bb,j]){
           print lines[aa,i]; dates[aa,i++] = ""
      }else{
           print lines[bb,j]; dates[bb,j++] = ""
      }
    }else if(aaok){
           print lines[aa,i]; dates[aa,i++] = ""
    }else if(bbok){
           print lines[bb,j]; dates[bb,j++] = ""
    }else break
  }
  i = 0
  for(j = 0; j<aanum;j++)
    if(dates[aa,j]!=""){
      dates[aa,i] = dates[aa,j]; lines[aa,i++] = lines[aa,j]
    }
  aanum = num[aa] = i
  i = 0
  for(j = 0; j<bbnum;j++)
    if(dates[bb,j]!=""){
      dates[bb,i] = dates[bb,j]; lines[bb,i++] = lines[bb,j]
    }
  bbnum = num[bb] = i
}'

The awk flips between the 2 files when it sees the ==> file heading from tail. It keeps data, in 4 arrays, separately for each file, arbitrarily called aa and bb and numbered 0 and 1. dates holds the timestamp (in seconds from the epoch), lines holds the input log line, num holds the count of lines, and maxes the highest date for a file. The first 2 arrays are 2-dimensional indexed by file (0 or 1) and count of held lines.

As each log line is read, the timestamp is converted to seconds, and saved in a new entry at the end of dates, and the line is saved too. The minimum of the current two dates is set in minmax. The whole held data is scanned and printed according to timestamp order upto this minimum. Printed entries are cleared, and at the end of the while loop, the arrays are squashed to remove these cleared entries.

Related Solutions

How to sort access log efficiently in blocks

Try split --filter:

split --lines 1000 --filter 'sort ... | sed ... | uniq -c' access.log

This will split access.log into chunks of 1000 lines and pipe each chunk through the given filter.

If you want to save the results for each chunk separately, you can use $FILE in the filter command and possibly specify a prefix (default is x):

split --lines 1000 --filter '... | uniq -c >$FILE' access.log myanalysis-

This will generate a file myanalysis-aa containing the result of processing the first chunk, myanalysis-ab for the second chunk, etc.

The --filter option to split was introduced in GNU coreutils 8.13 (released in September 2011).

Bash – Back up logs to a new directory

Also you can use log-rotate for the same, see following example

# Logrotate file for trace

/source/path/trace_*.log {
    missingok
    create
    compress
    rotate 1
    lastaction
        # After compressing logs, move to other location 
        Log_dir="/target/dir/old_log_$(date +%F)/$(date +%H_%S)/"
        [[ ! -d "${Log_dir}" ]] && /bin/mkdir -p "${Log_dir}"
        /bin/mv /source/path/*.gz "${Log_dir}"
    endscript
}

save above file, let say /etc/logrotate_trace.conf then simply set cron job for every hour

00 * * * * /usr/sbin/logrotate  -f /etc/logrotate_trace.conf

for testing you can run it from command line as

/usr/sbin/logrotate  -f /etc/logrotate_trace.conf

Best Answer

Related Solutions

How to sort access log efficiently in blocks

Bash – Back up logs to a new directory

Related Question