Can someone explain what bloating means in database terms? For example what does it mean to say an index is bloated. I tried searching for it but there is no explanation of what bloating is, only what it causes or what it is caused by.
Postgresql – table bloating in databases
postgresqlterminology
Related Solutions
In any case, when one thing hasMany of another how/where does it show up in application and database logic?
The only way I can think to answer this question is with another example of hasMany that may or may not answer your question.
err, except I should have called the table "comment" but I think the point stands.
I don't know that this really helps you understand anymore than what you already had.
So I'll also share a query that might demonstrate how this works:
declare @var char(5)
set @var = 'query'
select
c.id
,c.text
,c.author
from comment c
inner join post p
on p.id = c.parentid
which we expect to return many rows, because one post generally has many comments (in theory, of course not all posts will have comments)
Update:
So here's the thing about relations: They are EASY to model via diagram (see the very first tool I pulled out to answer this question) but they are HARD to diagram via text. So instead of trying to define whether we mean 1:M or 0:N, we just write it in "near English" and say what we intend. Here's some alternatives:
one to one hasOne post hasOne author (not true for CW ;])
one to many hasMany The one in question
many to one belongsTo parent child, comments belongTo post
many to many hasAndBelongsToMany This describes a complex relationship
many to many to many hasManyAndBelongsToMany
As you see they can get quite complicated. This would ... once again! ... indicate a need to refactor your business logic into simpler and smaller chunks.
These can also be called "association types" and may be indicated by verbage like "WrittenBy" (a many to one) or "IsOwnedBy" (one to one? - context is key)
Before I walk away from the "hasMany" concept for the last time (well this is an update, more will likely be written) I want to mention that while it's useful for "plain English descriptions", it's also used in ActiveRecord styled installations. Take, for instance, Ruby. http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html The rest of this post (so far) is taken from that page:
class Project < ActiveRecord::Base
belongs_to :portfolio
has_one :project_manager
has_many :milestones
has_and_belongs_to_many :categories
end
This allows the ActiveRecord ORM to create the data model and to maintain the relationships as specified by the business logic within the program. But I don't write Ruby or Rails so that's as far as I can tell you, I just know that it exists there.
A cursor is essentially a materialized result set from a query that maintains its own state (essentially remembers what row is "current") and in some cases allows you to manipulate the current row.
Cursor lifecycle looks more or less like this:
- Open the cursor. At this point the query that defines the cursor is executed.
- Position the cursor. After the cursor is opened, it is positioned before the first row of the result set, so one needs to move it to the first row. In the simplest case you can only move the cursor to the "next" row, but in some cases you could go to the "previous" or even to an arbitrary row of the result set, depending on the parameters used to open the cursor and whether the particular DBMS supports these kinds of cursors.
- Fetch from the cursor. This operation assigns values from the columns of the "current" row to the application variables.
- Update columns in the current row or delete the current row, if the particular cursor allows such manipulation.
- Close the cursor. All resources used by the result set are freed at this point.
Typically steps 2-4 are performed in an application program (or stored procedure) loop, allowing you to read each record of the result set and do something with it if necessary.
Best Answer
Due to how PostgreSQL handles transactions and concurrency, MVCC - Multi-Version Concurrency Control, you can get bloat. In PostgreSQL, when you do an
UPDATE
orDELETE
, the row isn't actually physically deleted. For aDELETE
, it simply marks the row as unavailable for future transactions, and forUPDATE
, under the hood it's a combinedINSERT
thenDELETE
, where the previous version of the row is marked unavailable.While the data is marked unavailable, it is still there, and the space cannot be used. To then mark the space as available for use by the database, a vacuum process needs to come along behind the operations, and mark that space available for the database to use. It isn't returned to the operating system, however. That only happens when there are no active rows in an entire page, which can be uncommon in some workloads. This can be a good thing for some workloads, because you can just simply update the space on the individual pages inside of the data files, without needing to add additional data files.
Problems come about with bloat when there are are excessively large numbers of dead tuples versus live tuples. Walking along and checking all of the visibility flags takes time, and having more data files for a relation results in additional unnecessary IO load. Bloat is especially noticeable on indexes, which can also have many dead tuples, sometimes many more than the table. Bloat can slow index lookups and scans, which will show up in slowly increasing query times and changing query plans.
You can restore space by using pg_reorg, pg_repack,
CLUSTER
, orVACUUM FULL
. This will go through and reorganize the files, moving tuples and reorganizing to make sure that there are no dead tuples, which will eliminate the bloat.Bloat can also be efficiently managed by adjusting
VACUUM
settings per table, which marks dead tuple space available for reuse by subsequent queries.You can use queries on the PostgreSQL Wiki related to Show Database Bloat and Index Bloat to determine how much bloat you have, and from there, do a bit of performance analysis to see if you have problems with the amount of bloat you have on your tables.