This is my table structure:
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+--------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('tbl_disk_id_seq'::regclass) | plain | |
inserttime | timestamp with time zone | | not null | now() | plain | |
buildid | text | | not null | | extended | |
app_version | text | | not null | | extended | |
device | text | | not null | | extended | |
device_api | text | | not null | | extended | |
userid | text | | not null | | extended | |
path | text | | not null | | extended | |
size | numeric | | not null | | main | |
type | integer | | not null | | plain | |
root | integer | | not null | | plain | |
status | integer | | not null | 0 | plain | |
I used this query to check the size of the table:
SELECT relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
AND relname like 'tbl_disk%'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
The result:
relation | total_size
-----------------------+------------
tbl_disk_20180830 | 90 MB
tbl_disk_20180831 | 32 MB
Now I add a new column document
with type tsvector
:
ALTER TABLE tbl_disk ADD "document" tsvector;
and update the data:
UPDATE tbl_disk
SET document = array_to_tsvector(ARRAY[split_part(path,'/',2),split_part(path,'/',6)]);
path data like : /data/user/0/com.abc.abc/database/olala.db
I just need two tokens 'data' and 'database' , that is why I split data.
After successfully updating, the size approximately doubles:
relation | total_size
-----------------------+------------
tbl_disk_20180830 | 175 MB
tbl_disk_20180831 | 61 MB
The data :
path | document
--------------------------------------------------------+--------------------
/data/user/0/com.abc.abc/lib/libabcqlite.so | 'data' 'lib'
/data/user/0/com.abc.abc/files/ddinfo2 | 'data' 'files'
/data/user/0/com.abc.abc/lib/libabcsqlite.so | 'data' 'lib'
/data/user/0/com.abc.abc/lib/libabcsqlite.so | 'data' 'lib'
/data/user/0/com.abc.abc/shaders/b32 | 'data' 'shaders'
You can see the size of tbl_disk_20180830
increases from 90MB to 175MB.
I don't know why it needs such a lot of space to save two tokens?
Best Answer
Databases cannot simply overwrite data "in place". If they did, they wouldn't be able to have atomicity or isolation, and probably not much concurrency and durability either.
When you do a bulk update of every row in a table, in PostgreSQL, the resulting table has a copy of both the old data and the new data. You can rid of the copy of the old data by running
VACUUM FULL <table name>
. Or you can just do a regular VACUUM, but if you do that it will not shrink the amount of space it takes up, it will just mark the space as being re-usable for future work on that table.