Aside from the upgrade recommendation (which I second btw but will not solve your current problem) your basic issue is that PostgreSQL will not run as an administrative user. This is a security feature.
Now, if this is a desktop not on a domain model, you need to create a limited user, and run initdb as that (or assign ownership to all files under the existing second cluster to that user if it already exists which it may). Then read the runas documentation.
Then run something like:
runas /user:<computername>\postgres postgres -D C:\my\new\cluster\data\directory
So if your computer name is mycomputer, it would look like:
runas /user:mycomputer\postgres postgres -D C:\my\new\cluster\data\directory
You can also:
pg_ctl register -N postgresqlc2 -U postgres -P [password] -D C:\my\new\cluster\data\directory ....
How do I avoid sort in the explain result? Actuall I didn't ask any sort in the SQL Statement.
You've asked for rows to be aggregated. One way to do this is to sort the data set and then scan it to collapse out duplicates. This can be faster than hash aggregation, which is the other way PostgreSQL knows how to do grouping.
So while you didn't explicitly say "sort the rows", it's still sorting them because of something you asked for.
The immediate problem is that PostgreSQL is being very conservative with how much memory it's using for sorts:
Sort Method: external merge Disk: 317656kB
and is doing a 300MB on-disk sort. You can see that pretty clearly if you take a look at the plain on explain.depesz.com.
If you:
SET work_mem = '400MB';
before running the query, it should be a whole lot different.
Unfortunately it's not a simple as changing your configuration, because PostgreSQL isn't too clever about resource management. Per the documentation on work_mem
, it may use up to work_mem
bytes per sort or join, per-session. So if you have max_connections = 50 and you're running complicated queries, you might find yourself using many gigabytes of working memory, exceeding free memory, and hitting swap. Which you really don't want.
It also seems to be doing a seqscan on counters
, but since it's finding about 1/4 the rows match the condition, that's probably the right thing to do - an index scan would probably be slower.
I find the default work_mem
way too conservative and tend to set it to at least 100MB on any reasonably large system. I also prefer to run PostgreSQL with PgBouncer in front, and a low max_connections
, allowing me to throw more resources at each individual connection.
Frankly, I'm curious to see how MS SQL Server executes this, because the numbers you report are astonishing for a query like this.
Best Answer
Each column of a table has an
attstattarget
property (stored inpg_attribute
) that tells how much data should be stored for it from the statistical sample gathered byANALYZE
.It defaults to
default_statistics_target
, which itself defaults to100
.In Statistics Used by the Planner, the doc says:
The reason behind the
30,000
pages and rows is that the sample size in rows considered by ANALYZE is300
times the maximum value ofattstattarget
for the sampled table, which would be the default100
.The
300
comes from a statistical formula mentioned in the source codesrc/backend/commands/analyze.c
:As for the number of pages, since rows don't span across pages, at most
N
pages are going to be read to getN
rows. I believe thatANALYZE
intentionally aims at fetching this maximum of pages to get the best sample. It makes sense since rows stored in a same page are more likely to be correlated.