The data you're getting from Toad appears to be incorrect or, at least, misleading. If you are using a locally managed tablespace with automatic extent allocation, Oracle will determine your initial and next extent sizes automatically. In 11.2, the first 16 extents are going to be 64k in size (for a total of 1 MB). The next 63 extents are going to be 1 MB in size. So if you have 26 extents, that implies that the table occupies 11 MB of space on disk. An initial extent of 1.44 GB makes no sense and a size of 4.18 MB seems rather low if you're saying it hasn't shrunk.
What does
SELECT tablespace_name,
extent_id,
bytes/1024 kb
FROM user_extents
WHERE segment_name = <<name of table>>
show you? If you have 26 extents, you should see that extents 0-15 are 64 kb in size and extents 16-25 are 1 MB in size.
SELECT SUM(bytes)/1024/1024 size_in_mb
FROM user_extents
WHERE segment_name = <<name of table>>
will show you the total size of the table's extents on disk.
Pre-9.0 VACUUM FULL
You're on PostgreSQL 8.4 or older, where VACUUM FULL
tends to bloat indexes. See this wiki page for details.
Don't run VACUUM FULL
as a periodic maintenance task. It's unnecessary and inefficient. This remains true on current versions, it's just not as bad on 9.0 and above. If you feel the need to run VACUUM FULL
regularly then you probably don't have autovacuum turned up far enough and are having table bloat issues. In fact, unless you've changed the FILLFACTOR
on the table from its default 100
a VACUUM FULL
is quite counter-productive; it'll compact away all the free space in the table, so following UPDATE
s will have to extend the table.
Table extensions are currently one of the poorer performing operations in PostgreSQL, as they're controlled by a single global lock. So if you have tables that fluctuate in size, you really want to avoid constantly compacting and truncating them only to extend them again.
On some unusual workloads it can be worth running a periodic CLUSTER
, which orders the table based on an index and effectively REINDEX
es it. If you do many UPDATE
s on the table should set a lower FILLFACTOR
for efficiency.
If this table is being emptied and re-populated regularly, you should generally using TRUNCATE
followed by COPY
to fill it back up. If it's big, drop the indexes before the COPY
then re-create them afterwards to produce indexes that are more compact and faster and to speed up the data load.
For one-off mitigation, CLUSTER
the table or REINDEX
it.
8.1?!?!
After edit added version: Holy bleepazoids, batman. 8.1.18? Forget what I said about autovacuum, autovacuum in 8.1 was way too ineffective. Upgrade to a sane version ASAP. You're not even on the current point release of 8.1, 8.1.23, from December 2010. 8.1.18 was released in September 2009! You need to begin your upgrade planning ... well, about two years ago, preferably. Read the release notes for every .0 version between 8.1 and the current release, focusing on the upgrade notes and compatibility notes. Then plan and execute your upgrade. If you don't feel up managing that on your own there are people who'll help you with it (I work for one of them) but honestly, the release notes and docs are quite sufficient for most people to do an upgrade themselves without undue pain.
Moving from 8.1 to 8.3 or newer will be your biggest pain point, as PostgreSQL 8.3 dropped a whole bunch of implicit casts that lots of potentially buggy SQL relied on. You'll need to test your application carefully on the newer version. Other changes to be aware of are:
- The removal of implicit
FROM
and in later versions removal of the backwards compatibility parameter for it;
- UTF-8 validation improvements in newer versions that can cause older dumps to fail to load until the data is corrected;
- The change to
standard_conforming_strings
by default;
- The change of
bytea_output
to hex
Best Answer
Size can differ due to several reasons:
Indexes take up disk space in order to facilitate faster lookup. The more indexes you have, the more disk space your DB will take up. GIN indexes are usually smaller, but aren't useful if you use range queries.
PostgreSQL supports concurrent access, which is implemented so, that updates and deletes on records only change the visibility of the records and not actually delete or overwrite the data, since one record may be used by another transaction. With update, a new (updated) copy gets then added. Both mean that old data is still written on disk. To free it up, PostgreSQL periodically performs vacuuming, which really removes the deleted records (after no transaction is using them).
PostgreSQL has default block size of 8KB. If your records are big, say 5KB, you can only get one record in one block, having fairly large (~3KB) slack.
Some possible solutions are:
Edit:
pg_dump
pg_dump is okay, if you specify to dump data, as you've said you did. If you use custom format (-Fc flags), pg_restore will be able to do some extra stuff with it, such as load only specified tables, also see pg_dump's man page. Custom format gzips dump by default. This may slow your dumping, so you may want to disable that, and if you still want data gzipped, there's a parallel gzip (pigz).
Backups
When you backup, you backup data in tables. Indexes get recreated from data in tables. If you can afford heavy IO on the database when you are doing backups and restores, then pg_dump and pg_restore may suit your needs. When dumping, pg_dump only dumps tables, and when restoring, indexes get rebuilt automatically as part of restore. So to make it clear: backups don't care about indexes.
If heavy IO and degraded performance is not acceptable, then you might want to have a secondary replication server, which will have a copy of data, but won't serve your ordinary queries, so you can use it to make dumps.
If you need point in time restore capability, you could setup WAL log (write-ahead log) archiving, which then enables you to restore to a specific transaction, but this is quite advanced. There are tools to help, such as Barman.
Big indexes
As you've already discovered, indexes can take up a lot of space. If you add up your data and indexes, you get your DB size: 28GB + 42GB = 70GB.
Having big indexes means there's a lot of extra disk space used. Index data is also cached in memory, so having big indexes means you may have two copies of indexed data in RAM, which means there's less RAM for caching data on disk and you get more cache misses. There are some options to evaluate to make indexes smaller:
But ... a lot of this depends on how data is accessed, which is often only known by developer of the application.