PostgreSQL – How to Get Row Count of Last Select Query

postgresql

I am using a scheduled script which exports the (new) rows of the PostgreSQL database into a textfile. There are also several instances of this script running for several DBs:

COPY (SELECT ... FROM ... GROUP BY zzz) TO STDOUT >file.csv

In order to have a consistency check of the file.csv, I then run a query to get the number of lines which the COPY query had:

SELECT count(0) FROM ... GROUP BY zzz >linecount

In the bash script there is a check if both are equal, if that's the case the script proceeds, if not it stops and reports an error (which happens rarely, but it does – and i haven't figured out why yet)

The problem, the query result is rather large (takes around 5-10min) and it is basically running twice (although the count(0) takes shorter, but still 3-7min)

Is there a PostgreSQL function to get the number of rows the last query had? If that would be the case, I could cut down DB load quite a load and shorten the export time.

I have looked into ROW_COUNT, but it only seems to be valid for UPDATE/INSERT and not SELECT, I'd also like to avoid using functions for simplicity reasons.

I am open to alternative suggestions, eg first query into a temp table, then export from there (maybe count(0) is faster then)?

Debian 7 Wheezy, SSD, 32GB Ram, PostgreSQL 9.1


UPDATE @Craig Ringer

I agree that the consistency check is somewhat 'flawed' by design. I noticed once that the import-check had a problem i couldnt explain, I then ran the export again, and the file had different lines than the one before:

Rowcount DB/LinecountF:532395/532014

rowcount: SELECT count(0)... as described originally
linecount  = wc -l

They are running as cronjobs and I havent figured out why those lines were missing. Running the same script second time and everything was fine. It happens very rarely, but the logfile didnt say anything (or mail result from cron).

I didnt check the exit status of the PSQL tough (as I built in my own checks). I will do that.

Best Answer

There is an extension that tracks queries: http://www.postgresql.org/docs/9.2/static/pgstatstatements.html

You have to enable the extension and then change your postgresql.conf to load it in the server. But, after that there would be next to zero overhead to get how many rows were affected.

where you could do something like:

select rows from pg_stat_statements where query = 'the query'

where rows is the total rows affected (returned, inserted, updated, deleted.)

The only caveat is that rows is the total affected rows of every time the query executed so you have to check the total before and after the query executed. And, you would not want to have races executing the same script between the same database and user as that is how it groups the queries.