A good way to sort two log files as they are created

logssort

I have two scripts running, both of which output a log file. I'd like to make a third script that can sort these logs by time stamp and merge them into one file as they are created. What is a good way to do this, ideally without overwriting the file constantly?

Best Answer

If you use tail -f to tail 2 or more files, then the command shows the data line-by-line, and outputs the filename each time the source of data changes. Using this you can write a script to merge the interleaved output from tail according to timestamp by holding onto each line until you see a line from the other file with a later timestamp.

For example, using two standard logfiles (/var/log/messages and /var/log/cron) which on my system have the same format for the timestamp at the start of the line (eg Jun 9 02:55:01), you can do the following:

tail -f /var/log/messages /var/log/cron |
awk '
BEGIN { num[0] = 0; num[1] = 0; }
/^==> /{
  file = $2; aa = file~/messages/?0:1; bb = 1-aa; 
  aanum = num[aa]; bbnum = num[bb];
  next }
/^$/{ next }
{ "date --date \"" $1 " " $2 " " $3 "\" +%s" | getline date
  lines[aa,aanum] = $0
  dates[aa,aanum++] = date
  maxes[aa] = date
  minmax = maxes[aa]
  if(maxes[bb]<minmax)minmax = maxes[bb]

  i = 0; j = 0;
  while(1){
    aaok = (i<aanum && dates[aa,i]<=minmax)
    bbok = (j<bbnum && dates[bb,j]<=minmax)
    if(aaok && bbok){
      if(dates[aa,i]<=dates[bb,j]){
           print lines[aa,i]; dates[aa,i++] = ""
      }else{
           print lines[bb,j]; dates[bb,j++] = ""
      }
    }else if(aaok){
           print lines[aa,i]; dates[aa,i++] = ""
    }else if(bbok){
           print lines[bb,j]; dates[bb,j++] = ""
    }else break
  }
  i = 0
  for(j = 0; j<aanum;j++)
    if(dates[aa,j]!=""){
      dates[aa,i] = dates[aa,j]; lines[aa,i++] = lines[aa,j]
    }
  aanum = num[aa] = i
  i = 0
  for(j = 0; j<bbnum;j++)
    if(dates[bb,j]!=""){
      dates[bb,i] = dates[bb,j]; lines[bb,i++] = lines[bb,j]
    }
  bbnum = num[bb] = i
}'

The awk flips between the 2 files when it sees the ==> file heading from tail. It keeps data, in 4 arrays, separately for each file, arbitrarily called aa and bb and numbered 0 and 1. dates holds the timestamp (in seconds from the epoch), lines holds the input log line, num holds the count of lines, and maxes the highest date for a file. The first 2 arrays are 2-dimensional indexed by file (0 or 1) and count of held lines.

As each log line is read, the timestamp is converted to seconds, and saved in a new entry at the end of dates, and the line is saved too. The minimum of the current two dates is set in minmax. The whole held data is scanned and printed according to timestamp order upto this minimum. Printed entries are cleared, and at the end of the while loop, the arrays are squashed to remove these cleared entries.

Related Question