Print only lines that are completely numeric

awknumeric datatext processing

I'd like to filter through a text file and only print the lines where each column is a valid floating point number. For example:

3 6 2 -4.2 21.2 
3 x 4.2 21.2 
3 2 2.2.2

Only the first line would pass as x, nor 2.2.2 are valid floats. I can write a python script that simply .splits() and runs a try/except block over each part, but this is slow for larger files. The input file has an unknown variable length number of columns and no scientific notation will be used. Is there an awk solution?

Best Answer

awk '
    # skip any obvious stuff
    /[^0-9. -]/ {next}
    {
        # test each field for a number
        for (i=1; i<=NF; i++) 
            if ($i + 0 != $i)
                next
        print
    }
'

This will break for valid numbers in scientific notation: 1.2e1 == 12

Related Solutions

How to adjust numeric fields in a text file

perl has a module called Scalar::Util (included with perl since v5.8) which has a useful function called looks_like_number(), which can be used to detect whether a field is a number or not.

looks_like_number is not perfect, but is pretty good.

The bare outline of a simple perl program to do what you want might look something like this:

#! /usr/bin/perl

use Scalar::Util qw(looks_like_number);

while(<>) {
  chomp;
  my @fields=split("\t");
  foreach my $f (0..scalar @fields-1) {
    if (looks_like_number($fields[$f])) {
      $fields[$f] += 42;
      $fields[$f] *= 7;
      $fields[$f] = sprintf("%.2f",$fields[$f]);
    }
  }
  print join("\t",@fields),"\n";
}

If given your sample data above as input, it prints this:

file name   size    owner    
file1.txt   380.41  root
file2.txt   295.21  user1
file3.txt   2016.00 user2
file4.txt   86709.00    root
file5.txt   441.00  user3
file6.txt   2016.00 user1
file name   owner   last modified   last accessed
text4.txt   root    383.11  388.71
text5.txt   user3   401.33  532.00
file1.txt   root    455.00  511.02

Here's another version of the script that uses Math::BigFloat for all calculations, rounding decimals to 2 digits.

#! /usr/bin/perl

use Scalar::Util qw(looks_like_number);
use Math::BigFloat;

while(<>) {
  chomp;
  my @fields=split("\t");
  foreach my $f (0..scalar @fields-1) {
    if (looks_like_number($fields[$f])) {
      my $BF = Math::BigFloat->new($fields[$f]);
      $BF->badd(42);
      $BF->bmul(7);
      $BF->ffround(-2);

      $fields[$f] = $BF->bstr();
    }
  }
  print join("\t",@fields),"\n";
}

example input:

file name   owner   last modified   last accessed
text4.txt   root    12.73   13.53
text5.txt   user3   15.3333 34
file6.txt   root    903709792518875002.42857142857142857142 903709792518875002
file7.txt   root    6659166111488656281486807152009765625   539422123247359763587428687890625

output:

file name   owner   last modified   last accessed
text4.txt   root    383.11  388.71
text5.txt   user3   401.33  532.00
file6.txt   root    6325968547632125311.00  6325968547632125308.00
file7.txt   root    46614162780420593970407650064068359669.00   3775954862731518345112000815234669.00

Best Answer

Related Solutions

How to adjust numeric fields in a text file

Related Question