PostgreSQL Batch Inserts – Random Silent Failures

I have been using Kettle to import data from a SQL Server to PostgreSQL on AWS RDS and when I was enabling all of my foreign key constraints I realized that data was missing, often in the even hundreds.

But checking the log of the Kettle job, it seems to be doing all read and writes correctly from a table of 1923 entries

2017/02/26 21:36:42 - Table output.5 - Finished processing (I=0, O=240, R=240, W=240, U=0, E=0)
2017/02/26 21:36:42 - Table output.3 - Finished processing (I=0, O=240, R=240, W=240, U=0, E=0)
2017/02/26 21:36:42 - Table output.7 - Finished processing (I=0, O=240, R=240, W=240, U=0, E=0)
2017/02/26 21:36:42 - Table output.6 - Finished processing (I=0, O=240, R=240, W=240, U=0, E=0)
2017/02/26 21:36:42 - Table output.1 - Finished processing (I=0, O=241, R=241, W=241, U=0, E=0)
2017/02/26 21:36:42 - Table output.2 - Finished processing (I=0, O=241, R=241, W=241, U=0, E=0)
2017/02/26 21:36:42 - Table output.4 - Finished processing (I=0, O=240, R=240, W=240, U=0, E=0)
2017/02/26 21:36:42 - Table output.0 - Finished processing (I=0, O=241, R=241, W=241, U=0, E=0)

But when running the query count on Postgres I get 1423 entries, and similar behavior on maybe 10% of all tables I have transferred.

As I don't have any prior experience in neither Kettle nor Postgres, what is it that can cause this issue?

Best Answer

With the help of the author of the script i used (sqlserver2pgsql) we realized it was a (new) issue with how Kettle handled parallel output table truncates.

https://github.com/dalibo/sqlserver2pgsql/issues/57#issuecomment-282675275

What happened was, and the reason i only lacked even hundreds of data, that the output job in some cases did run truncate after one or more workers already had done inserts on the table

Best Answer

Related Solutions

Postgresql – Query works locally but not on AWS RDS

Sql-server – Batch Inserts and Batch Deletes

Related Question