The largest table in our DB (Postgres 9.5) has almost 2 billion rows and is about to reach the 32-bit integer limit. I have begun a process of converting the PK (id) from int to bigint as follows:
ALTER TABLE bigtable ADD COLUMN id_bigint BIGINT NULL;
ALTER TABLE bigtable ALTER COLUMN id_bigint DROP DEFAULT;
Then, the following script runs to copy all values from id
to id_bigint
in chunks:
def copy_bigtable_id(model, table_name, batch_count, start_index=0):
latest_id = model.objects.latest('id').id
counter = start_index
if latest_id < counter:
print("Latest id {} in {} is less than start index {}. Nothing to copy. Exiting.".format(
latest_id, table_name, start_index))
return
print("Starting migration of 'id' to 'id_bigint' on {} from index {} to {}".format(
table_name, start_index, latest_id))
print("Copy in batches of {} rows".format(batch_count))
while counter <= latest_id:
with connection.cursor() as cursor:
cursor.execute('BEGIN;')
select = "SELECT * FROM {} WHERE id >= %s AND id < %s FOR UPDATE".format(
table_name
)
update = "UPDATE {} SET id_bigint=id WHERE id >= %s AND id < %s".format(
table_name
)
cursor.execute(select, [counter, counter + batch_count])
cursor.execute(update, [counter, counter + batch_count])
cursor.execute('COMMIT;')
counter = counter + batch_count
latest_id = model.objects.latest('id').id
print("copied {0:.2f}% of ids in {1}. counter={2}".format(
min(100, 100.0 * counter / latest_id), table_name, counter)
)
print('ALL DONE! Last id in {} = {}'.format(table_name, latest_id))
Following the script, id
is replaced by id_bigint
as follows:
ALTER TABLE bigtable DROP CONSTRAINT bigtable_pkey;
DROP SEQUENCE bigtable_id_seq CASCADE;
ALTER TABLE bigtable RENAME COLUMN id TO id_old;
ALTER TABLE bigtable RENAME COLUMN id_bigint TO id;
ALTER TABLE bigtable ADD PRIMARY KEY (id);
CREATE SEQUENCE bigtable_id_seq START WITH <latest_id>
ALTER TABLE bigtable ALTER COLUMN id SET DEFAULT NEXTVAL('bigtable_id_seq');
ALTER TABLE bigtable ALTER COLUMN id_old DROP NOT NULL;
ALTER TABLE bigtable DROP COLUMN id_old;
This approach was chosen after failing to perform ALTER TABLE
on the existing id
field because the table is so large and ALTER TABLE
takes an exclusive lock, blocking any access to the table. We wanted to minimize down time.
Our script has now been running for over 10 hours now and covered half of the table. The script is not blocking any other SELECT
calls on the table as it uses ROW EXCLUSIVE
lock only on the rows it is changing.
However, after covering 50%+ of the rows, normal SELECT
queries on the table started to take extremely long time. It turns out the indexes set on the table were not being used.
This is the SELECT query which broke:
SELECT "bigtable"."id", "bigtable"."latitude", "bigtable"."longitude",
"bigtable"."timestamp", "bigtable"."speed", "bigtable"."accuracy",
"bigtable"."trip_id", "bigtable"."step_id", "bigtable"."revised"
FROM "bigtable"
WHERE "bigtable"."step_id" = 5907395
ORDER BY "bigtable"."timestamp" ASC LIMIT 1;
this is the EXPLAIN
result after the massive table updates (in the script above):
Limit (cost=0.58..44711.92 rows=1 width=49)
-> Index Scan using bigtable_timestamp on bigtable (cost=0.58..2826516601.39 rows=63217 width=49)
Filter: (step_id = 5907395)
Notice that the query planner is not sorting the results as ORDER BY would suggest.
This is the EXAPLIN
of the same query on my localhost replica of the database with much smaller dataset, but exact same schema. This is the query I'd expect and which returns results in milliseconds:
Limit (cost=157.96..157.97 rows=1 width=53)
-> Sort (cost=157.96..158.30 rows=136 width=53)
Sort Key: "timestamp"
-> Index Scan using bigtable_step_id on bigtable (cost=0.43..157.28 rows=136 width=53)
Index Cond: (step_id = 5907395)
bigtable
has the following indexes:
CREATE UNIQUE INDEX bigtable_pkey ON bigtable(id int8_ops);
CREATE INDEX bigtable_step_id ON bigtable(step_id int4_ops);
CREATE INDEX bigtable_timestamp ON bigtable(timestamp int4_ops);
CREATE INDEX bigtable_trip_id ON bigtable(trip_id int4_ops);
The "broken" query plan in the first EXPLAIN
takes hours to complete. I dug a bit more and found this post which mentions that Postgres query planner sometimes gets confused with ORDER BY... LIMIT 1
. Is this the case I'm running into??
Rows are inserted regularly into bigtable
with ever-increasing step_id
and timestamp
. This query used to take a dozen milliseconds at worst, before it quickly deteriorated to over 30 minutes. This happened, again, after we began populating bigtable with id_bigint. Could this be the result of index bloat?
Are the updates we are performing on that table screwing up the indexes, or Postgresql planner capability to use those indexes?
What happened is precisely what we wanted to avoid: down time. The app is running but effectively not serving clients because each query on that large table, which used to take milliseconds, now takes over 1 hour. I don't have EXPLAIN
query plans from before, but the queries took milliseconds and I'm pretty sure index scans were used. I tried to run EXPLAIN on the same query now by gave up after 15 min. Looking at the running queries on the live instance, similar queries had been "active" for longer than 30 min. So there has been a fundamental change in the way the query planner runs those queries, not using the indexes.
SELECT relname, last_vacuum, last_autovacuum, n_dead_tup from pg_stat_user_tables
shows a huge amount of dead tuples on this table (1331381938) and that no vacuum or autovacuum has been performed on it. We have autovacum enabled and autovacuum is constantly running on this bigtable. At this point in time it's been running for over 20 hours and has not completed.
What is the fastest way to resolve this situation?
- Should I run
VACUUM
on this table? If so, should I combine it withANALYZE
to rebuild the query planner statistics? - Should I run
REINDEX
on this table?
Given the sheer size of this table, is there a way to speed up VACUUM
or ANALYZE
?
Best Answer
Could you show us exactly how that was done? Details matter here. How fast are these chunks going by?
It shouldn't "screw them up", other than by possibly bloating them, or clearing the all-visible flag on so many pages that an index-only-scan doesn't seem worthwhile anymore. Do you have before and after query plans you can show us? Did the before involve index-only-scans, or just regular index scans?
Autovacuum has a default level of IO throttling which makes it pretty slow. (This default was reduced by a factor of 10 between v11 and v12 so it is much faster now, you didn't say what version you were using.) While a manual vacuum is unthrottled by default. If you start a manual VACUUM, is should run much faster than autovac does.
The autovac should automatically cancel itself after it detects the manual VACUUM is trying to run. There are situations where it won't, but they probably don't apply to you.
This might consume all of the IO capacity on your system, but if you are currently down anyway that should hardly matter.
I would not expect ANALYZE to much difference, as the only column whose distribution is changing is not yet being used. I guess the correlation values might be changing in important ways, but I doubt that can explain what you are seeing.
No. Regular manual vacuum should be enough.
I see no reason to think that would help more than trivially. And it will certainly take time and resources away from things that possibly could.