I have a fairly large java webapp system that connects to a postgres8.4 db on rhel.
This app only does inserts and reads.
This has been running without issue for 2.5 years and for the last year or so I implemented a data deletion script which deletes up to 5000 parent records and the corresponding child data in 3 tables.
This deletion script is called from a cron job every 5 minutes. It has been working flawlessly for over a year. This script always takes about 1-3 seconds to complete.
Also called from cron is vacuum verbose analyse
script which is just called once a day (a few minutes before a deletion). This also has been working flawlessly for over a year. This takes about 15 minutes to complete.
Now last weekend (Saturday), the vacuum
kicked off at 14:03, the deletion
kicked off at 14:07 and did not complete. The next deletion
kicked off at 14:12 and encountered a deadlock. The webapp at 14:14 hung. At 14:16 everything resumed as per normal and has been running fine since.
Now
To add more confusion, this server has an almost identical setup (standby sever) however the vacumm
cron is due to run at 02:03 in the morning. When the vacuum
kicked off on Sunday at 02:03, the same situation as above was encountered and at 02:14 the java app hung resuming at 02:15.
More
To further confuse me:
every time I go to the site I takes reading of df
– this is always around 40% but after this happened, the df
now reports 5% less
Any ideas? Please let me know if I have left out any relevant information.
edit
This is the Postgresql-Sat.log
WARNING: skipping "pg_authid" --- only superuser can vacuum it
WARNING: skipping "pg_database" --- only superuser can vacuum it
WARNING: skipping "pg_tablespace" --- only superuser can vacuum it
WARNING: skipping "pg_pltemplate" --- only superuser can vacuum it
WARNING: skipping "pg_shdepend" --- only superuser can vacuum it
WARNING: skipping "pg_shdescription" --- only superuser can vacuum it
WARNING: skipping "pg_auth_members" --- only superuser can vacuum it
ERROR: deadlock detected
DETAIL: Process 12729 waits for ShareLock on transaction 91311423; blocked by process 12800.
Process 12800 waits for ShareLock on transaction 91311422; blocked by process 12729.
Process 12729: delete from child1 where id in
(
select id from parent where date_collected < now() - interval '13 months' order by id limit 5000
);
Process 12800: delete from child1 where id in
(
select id from parent where date_collected < now() - interval '13 months' order by id limit 5000
);
Best Answer
I'm not able to test this since I don't use postgresql 8.4 but i think you have to:
to check it
Note:this is for postgresql v9.1 i am not sure this work for 8.4 or not.
You mentioned about dead lock so it mean your transaction are waiting each other until occur deadlock(no transaction completed(commit or rollback) so make sure the locking transaction was completed(process 12729) and then transaction (Process 12800) will not wait anymore.
Vacuum is occur table lock level too. when u run vacuum command it will locked the target tables to reclaims storage occupied by dead tuples so that why when you check the df - your free disk space is higher than before.
about Privileges
you need to use supper user(postgres) to make vacuum verbose analyse for above tables (pg_*) or you just specified the target table only, it no need to use supper user privileges.
ex: