Postgresql – How to control how Pentaho DI formats integers coming from a text file

etlpentahopostgresql

I have a simple pipe-delimited text file with integers recorded as a sequence of numbers:

012345|00678 |various|text|fields

There are some leading zeros, and some trailing white space, but no decimals, commas, internal spaces, or other formatting. Should be simple, right?

Pentaho's field analysis concludes that the fields are of type "Number", format "#.#", and precision "0". That format string makes me wonder if Pentaho thinks there might be decimals in the columns, but when I run Preview Rows, I get nice clean integers, without decimals, scientific notation, or leading zeros, exactly as I want it.

When I try to consume the data with a PostgreSQL Bulk Loader step, I get error messages like these:

Import Table.0 - ERROR {0} ERROR:  invalid input syntax for integer: "5.017645082E9"
Import Table.0 - ERROR {0} ERROR:  invalid input syntax for integer: "1.0"

It seems that Pentaho has taken it on itself to format the integers with scientific notation and/or fixed decimals, and naturally Postgres balks at importing this.

How can I stop Pentaho from formatting these integers?

I'm using Pentaho Data Integration v6.1, and Postgres 9.4.9. The file uses Windows line breaks and ANSI text. Changing the field format to "#" or "0" had no effect. Using a Table Output instead of a PostgreSQL Bulk Load works, but it is much slower.

Best Answer

Try to convert values in format you need before output, it can be done in "Select values" step, bookmark "Meta-data".