I have a database originally built with an instance of PostgreSQL with a custom page size much larger than the default. I'm trying to migrate its contents to a standard, 8k-page database.
Because data indexed by a standard index in PostgreSQL must be no more than 1/3rd of a page size, some data contained in the old database cannot be represented in the new one without modifying index definitions. Making schema changes is not an immediately available option; dropping rows (from the tables I expect to be involved) is.
Finding overall size of an index in PostgreSQL is easy and well-documented. Finding individual rows for which associated index sizes are over a threshold is something I'm having some trouble.
How can I find rows (across all tables in a schema) containing indexed values for which the indexes of said values are too large to represent in a standard PostgreSQL build (which is to say, longer than 2712 bytes)?
Best Answer
Let me begin this reply with a caveat: I've never encountered this exact problem, so I don't fully understand the nature of it. However, I'm going to give my advice so that perhaps it gives you the insight you need to complete your task.
pageinspect
extensionRecent versions of PostgreSQL come with an extension which you can install that allows you to inspect the individual pages of elements stored in your database.
First things first, all you need to do to use it is to run the command
and it's ready to go.
Now, once again, I don't fully understand the issue, but it appears you are concerned with the size of the indexed entry, and not necessarily the actual row entry in the heap.
For that case, you should use a function call from the
pageinspect
extension to query the pages of your BTree index, asand you'll get back some results like
Presumably, though you should check against your own data, you can see the number of bytes which the index key is using by checking the
itemlen
field. I am betting that you can use this to check against your pre-determined limit which you said was 2712 bytes, and in that case use this info to collect all thectid
s which violate your condition.Broadly speaking then, you need to create a PL/pgSQL function to iterate over the pages of your index(s), checking the
itemlen
, and collecting thectid
s of the heap entries, and once completed, since you state it was OK to drop violating rows, then wrap it all up with aI hope this helps!