Postgresql – delete hung soon after vacuum verbose analyse

postgresqlpostgresql-8.4vacuum

I have a fairly large java webapp system that connects to a postgres8.4 db on rhel.
This app only does inserts and reads.

This has been running without issue for 2.5 years and for the last year or so I implemented a data deletion script which deletes up to 5000 parent records and the corresponding child data in 3 tables.

This deletion script is called from a cron job every 5 minutes. It has been working flawlessly for over a year. This script always takes about 1-3 seconds to complete.

Also called from cron is vacuum verbose analyse script which is just called once a day (a few minutes before a deletion). This also has been working flawlessly for over a year. This takes about 15 minutes to complete.

Now last weekend (Saturday), the vacuum kicked off at 14:03, the deletion kicked off at 14:07 and did not complete. The next deletion kicked off at 14:12 and encountered a deadlock. The webapp at 14:14 hung. At 14:16 everything resumed as per normal and has been running fine since.

Now

To add more confusion, this server has an almost identical setup (standby sever) however the vacumm cron is due to run at 02:03 in the morning. When the vacuum kicked off on Sunday at 02:03, the same situation as above was encountered and at 02:14 the java app hung resuming at 02:15.

More

To further confuse me:
every time I go to the site I takes reading of df – this is always around 40% but after this happened, the df now reports 5% less

Any ideas? Please let me know if I have left out any relevant information.

edit

This is the Postgresql-Sat.log

WARNING:  skipping "pg_authid" --- only superuser can vacuum it
WARNING:  skipping "pg_database" --- only superuser can vacuum it
WARNING:  skipping "pg_tablespace" --- only superuser can vacuum it
WARNING:  skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING:  skipping "pg_shdepend" --- only superuser can vacuum it
WARNING:  skipping "pg_shdescription" --- only superuser can vacuum it
WARNING:  skipping "pg_auth_members" --- only superuser can vacuum it
ERROR:  deadlock detected
DETAIL:  Process 12729 waits for ShareLock on transaction 91311423; blocked by process 12800.
    Process 12800 waits for ShareLock on transaction 91311422; blocked by process 12729.
    Process 12729: delete from child1 where id in
    (
    select id from parent where date_collected < now() - interval '13 months' order by id limit 5000
    );
    Process 12800: delete from child1 where id in
    (
    select id from parent where date_collected < now() - interval '13 months' order by id limit 5000
    );

Best Answer

I'm not able to test this since I don't use postgresql 8.4 but i think you have to:


  1. check which transaction(check Sql statement) is locked and which is locking.

    to check it

SELECT 
    kl.pid as blocking_pid,
    ka.usename as blocking_user,
    ka.current_query as blocking_query,
    bl.pid as blocked_pid,
    a.usename as blocked_user, 
    a.current_query as blocked_query, 
    to_char(age(now(), a.query_start),'HH24h:MIm:SSs') as age
FROM pg_catalog.pg_locks bl
    JOIN pg_catalog.pg_stat_activity a 
        ON bl.pid = a.procpid
    JOIN pg_catalog.pg_locks kl 
        ON bl.locktype = kl.locktype
        and bl.database is not distinct from kl.database
        and bl.relation is not distinct from kl.relation
        and bl.page is not distinct from kl.page
        and bl.tuple is not distinct from kl.tuple
        and bl.virtualxid is not distinct from kl.virtualxid
        and bl.transactionid is not distinct from kl.transactionid
        and bl.classid is not distinct from kl.classid
        and bl.objid is not distinct from kl.objid
        and bl.objsubid is not distinct from kl.objsubid
        and bl.pid  kl.pid 
    JOIN pg_catalog.pg_stat_activity ka 
        ON kl.pid = ka.procpid
WHERE kl.granted and not bl.granted
ORDER BY a.query_start;

Note:this is for postgresql v9.1 i am not sure this work for 8.4 or not.

  1. You mentioned about dead lock so it mean your transaction are waiting each other until occur deadlock(no transaction completed(commit or rollback) so make sure the locking transaction was completed(process 12729) and then transaction (Process 12800) will not wait anymore.

  2. Vacuum is occur table lock level too. when u run vacuum command it will locked the target tables to reclaims storage occupied by dead tuples so that why when you check the df - your free disk space is higher than before.

  3. about Privileges

WARNING:  skipping "pg_authid" --- only superuser can vacuum it
WARNING:  skipping "pg_database" --- only superuser can vacuum it
WARNING:  skipping "pg_tablespace" --- only superuser can vacuum it
WARNING:  skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING:  skipping "pg_shdepend" --- only superuser can vacuum it
WARNING:  skipping "pg_shdescription" --- only superuser can vacuum it
WARNING:  skipping "pg_auth_members" --- only superuser can vacuum it

you need to use supper user(postgres) to make vacuum verbose analyse for above tables (pg_*) or you just specified the target table only, it no need to use supper user privileges.

ex:

vacuum verbose analyse [table_name];