PostgreSQL – Does System Process Slow Down Database Access?

performancepostgresql

I have a Fairly large DB containing mostly html documents. I wrote a python script that fetches always 1000 html documents at once from my database cur.execute("SELECT id,url,html_file FROM html ORDER BY id OFFSET %s LIMIT %s;" % (offset, limit)). Afterwards I do a bit of regular expression on the html documents. Because I have a lot html files to go through I track the time how long each step takes. For the first 4000 times retrieving the html documents from the database took around 3 seconds. Now it is up to 4 Minutes. I'm running a Windows 7 machine and took a look at the Resource Monitor. The Database is on its own HDD with nothing else on it. But in the Resource Monitor i could see that the System Process constantly reads something out of my Postgres Folder

Image   PID File    Read (B/sec)    Write (B/sec)   Total (B/sec)   I/O Priority    Response Time (ms)
System  4   E:\PostgreSQL\data\base\10596207\10598404.1 13,855,193  0   13,855,193  Normal  9
System  4   E:\PostgreSQL\data\base\10596207\10598404.1 11,182,442  0   11,182,442  Normal  9

so the question is is this normal or is the system process the culprit here and how do I stop it? (malware and virus scanning are disabled as well as search indexing)

I followed the advice from Reaces and found that superfetch was already disabled. But I also downloaded the process explorer from sysinternals and there I observed something strange. After about the same amount of parsed html documents (about 4,513,000 fetched in package of a 1,000 from the database) the postgres process begins to write a lot of stuff to the HDD up to this point it did read ~3TB from disk and write ~100mb.

But now about 3000 html documents later the process explorer shows me that the postgress proces did write 10GB to the HDD with heavy write access for every cur.execute("SELECT id,url,html_file FROM html ORDER BY id OFFSET %s LIMIT %s;" % (offset, limit)) command

What is the database doing and how do I stop it? my best guess is that the database orders it self to be more efficient the next time but that does not help me now.

Best Answer

OFFSET is not what you want to use.

The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large OFFSET might be inefficient.

http://www.postgresql.org/docs/9.3/static/queries-limit.html

Instead, add a where clause along the lines of cur.execute("... WHERE id > %s ORDER BY id LIMIT %s", (last_id_from_previous_batch, 1000)). This will read and return rows starting from that id, instead of scanning OFFSET+LIMIT rows every time and returning only the last chunk. Once offset is large enough, you are likely either swapping, or the query plan starts to require an on-disk sort, causing the I/O.

Also, your code as written is vulnerable to SQL injection. Pass parameterized values as the second argument to execute(), the dbapi module will handle quoting/etc - the % operator will not.

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

SQL Server – Cache Flush and Disk I/O Performance

Others have already pointed out the culprit: SQL Server accumulates updates in memory (in the buffer pool) and only flushes them out periodically (at checkpoints). The two options suggested (-k and checkpoint interval) are complementary:

-k will make the cause the checkpoint to produce less aggressive IO requests and last longer
lowering recovery interval will cause the checkpoint to start more often

But I did not respond only to regurgitate the fine comments you received do far :)

What you're seeing is, unfortunately, a very typical behavior of queued processing. Whether you use Service Broker queues or opt for using tables as queues approach, the system is very prone to this kind of behavior. This is because queuing based processing is write heavy, even more write heavy than OLTP processing. Both enqueue and dequeue primitives are write operations and there are almost no read operations. Simply put, queue processing will generated the most writes (= most dirty pages, and most log) compared to any other workload, even OLTP (ie. TPC-C like workload).

Very importantly, the writes of a queue workload follow an the insert/delete pattern: every row inserted is very quickly deleted. This is important to distinguish from an append-only pattern of a insert heavy (ETL) workload. You are basically feeding the ghost cleanup task a full meal, and you can easily outrun it. Think about what that means:

enqueue is an insert, it will create a dirty page
dequeue is a delete, it will dirty the same page again (it may be lucky and catch the page before checkpoint, so it will avoid double-flush, but only if is lucky)
ghost cleanup will cleanup the page, making it dirty again

Yes, it really means that you may end up writing a page three times to disk, in three different IO requests, for each message you process (worst case). And it also means that the random IO of checkpoints will be really random as the write point of the page will be visited by those moving heads again between two checkpoints (compare with many OLTP workloads tend to group the writes on some 'hot spots', not queues...).

So you have these three write points, racing to mark the same page dirty again and again. And that is before we consider any page splits, which queue processing may be prone too because of the insert key order. By comparison 'typical' OLTP workloads have a much more balanced read/write ratio and the OLTP writes distribute across inserts/updates/deletes, often with updates ('status' changes) and inserts taking the lion's share. Queue processing writes are exclusively insert/delete with, by definition, 50/50 split.

Some consequences follow:

Checkpoint becomes a very hot issue (no longer a surprise for you)
You'll see heavy fragmentation (the fragmentation per-se won't matter much as you are not going to do range scans, but your IO efficiency suffers and ghost cleanup has more to work, slowing it down even more)
Your MDF storage random IO throughput is going to be your bottleneck

My recommendation comes in 3 letters: S, S and D. Move your MDF to a storage that can handle fast random IO. SSD. Fusion-IO if you have the moneys. Unfortunately this is one of those symptoms that cannot be resolved with more cheap RAM...

Edit:

As Mark points out you have two logical disks backed by one physical disk. Perhaps you tried to follow best practices and split log on D: and data on C: but alas is to no avail, C and D are the same disk. Between checkpoints you achieve sequential throughput but as soon as checkpoint starts the disk heads start to move and your log throughput collapses, taking down the entire app throughput. Make sure you separate the DB log so that is not affected by data IO (separate disk).

Best Answer

Related Solutions

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

SQL Server – Cache Flush and Disk I/O Performance

Related Question