Suppose I have the following <Tab>
separated text file:
file name size owner
file1.txt 12.345 root
file2.txt 0.172222 user1
file3.txt 2.46e2 user2
file4.txt 12345 root
file5.txt 21 user3
file6.txt 246.0 user1
file name owner last modified last accessed
text4.txt root 12.73 13.53
text5.txt user3 15.3333 34
file1.txt root 23 31.0032
This file consists of several "tables", each of which starts with a header line and then contains some data lines.
Some columns are numeric, but each table can have different number, as well as different types of columns. The types of columns are not known in advance and they can not be determined according to the table header.
The numeric values in the table use various formats – there might be integers, as well as floating point decimal numbers or also numbers in scientific notation.
My question is how to convert all the numeric fields in this table into the same format. For example, I might want to have every numeric field formatted with a "%.2f"
printf format specifier. Naturally, the other, non-numeric fields must remain unchanged.
Also, I would like to be able to arbitrarily adjust (e.g. add 42 and then multiply by 7) every numeric field contained in this file.
The solution I am looking for should be field-based.
It should scan the entire file and for each field it should determine whether it is numeric or not. If it is numeric, it should print its adjusted and formatted value. Otherwise, it should just print the original.
I know that something like that can be done with awk
. But if I remember correctly, awk
uses double
for internal representation of numbers and therefore it might have problems with precision and larger values. So, ideally, I would like to use something else, something which should correctly handle at least 64 bit integers.
Is there any simple way to achieve this?
Best Answer
perl has a module called
Scalar::Util
(included with perl since v5.8) which has a useful function calledlooks_like_number()
, which can be used to detect whether a field is a number or not.looks_like_number
is not perfect, but is pretty good.The bare outline of a simple perl program to do what you want might look something like this:
If given your sample data above as input, it prints this:
Here's another version of the script that uses Math::BigFloat for all calculations, rounding decimals to 2 digits.
example input:
output: