Give this a try:
SELECT
t1.UserName AS "Account/UserName",
t1.Hash AS "Account/Hash",
t2.Phone AS "Personal/Phone",
t2.City AS "Personal/City"
FROM table1 t1
INNER JOIN table2 t2
ON t1.rid = t2.rid
FOR XML PATH ('UserDetails');
I have added a second code sample in response to your comment. I have no idea if this will work, or if it is even the best way to accomplish what you want it to. Please let me know.
SELECT
t1.UserName AS "Account/UserName",
t1.Hash AS "Account/Hash",
(
SELECT
t2.Phone AS "Personal/Phone",
t2.City AS "Personal/City"
FROM table1 t2
INNER JOIN table2 t3
ON t2.rid = t3.rid
WHERE t2.rid = t1.rid
FOR XML PATH('Personals')
)
FROM table1 t1
INNER JOIN table2 t2
ON t1.rid = t2.rid
FOR XML PATH ('UserDetails');
First off, can it be? You write:
I want to fetch all data with the updated_at field with a date of a
few days ago.
But your WHERE
condition is:
(date(updated_at)) < (date(now())-7)
Should be >
?
Indexes
For optimal performance, you could ...
- partition your indexes
- exclude irrelevant rows from the indexes
- automatically recreate indexes at off-hours with updated predicate.
Your indexes could look like:
CREATE INDEX objects_id_updated_at_idx ON objects ((updated_at::date) DESC NULLS LAST)
WHERE id BETWEEN 0 AND 999999
AND updated_at > '2012-10-01 0:0'::timestamp; -- some minimum date
CREATE INDEX objects_id_updated_at_idx ON objects ((updated_at::date) DESC NULLS LAST)
WHERE id BETWEEN 1000000 AND 1999999
AND updated_at > '2012-10-01 0:0'::timestamp; -- some minimum date
-- etc.
(Assuming updated_at
is type timestamp
. With timestamptz
, the cast to date is not IMMUTABLE
and you need to define the time zone first ...)
The second condition excludes irrelevant rows from the index right away, which should make it smaller and faster - depending on your actual data distribution. In accordance with my preliminary comment, I am assuming you want newer rows.
The condition also automatically excludes NULL values in updated_at
- which you seem to allow in the table and obviously want to exclude in the query. The usefulness of the index deteriorates over time. The query always retrieves the latest entries. Recreate the index with an updated WHERE
clause periodically. This requires an exclusive lock on the table, so do it at off hours. There is also CREATE INDEX CONCURRENTLY
to minimize the duration of locks:
CREATE INDEX CONCURRENTLY objects_id_up_201211_idx ...; -- create new idx
DROP INDEX CONCURRENTLY objects_id_up_201210_idx; -- then drop old
DROP INDEX
allows CONCURRENTLY
since Postgres 9.2.
Related answer on SO:
To further optimize, you could use CLUSTER
like we mentioned in the comments. But you need a full index for that. Doesn't work with a partial index. You would create temporarily:
CREATE INDEX objects_full_idx ON objects (id/1000000, (updated_at::date) DESC NULLS LAST);
This form of the full index matches the sort order of above partial indexes.
CLUSTER objects USING objects_full_idx;
ANALYZE objects;
This will take a while, since the table is rewritten physically. It is also effectively a VACUUM FULL
. It needs an exclusive write lock on the table, so do it at off-hours - provided you can afford that at all. Again, there are less invasive alternatives: pg_repack or pg_squeeze.
You can then drop the index again (if it's unused). It's a one-time effect. I would at least try this once to see how much your queries benefit from it. The effect deteriorates with subsequent write operations. You could repeat this procedure at off hours if you see a substantial effect.
If your table receives a lot of write operations, you have to weigh cost and benefit for this step. For many UPDATE
s consider setting a FILLFACTOR
lower than 100. Do that before you CLUSTER
.
Query
SELECT count(*)
FROM objects
WHERE id BETWEEN 0 AND 999999 -- match conditions of partial index!
AND updated_at > '2012-10-01 0:0'::timestamp
AND updated_at::date > (now()::date - 7);
More
A more advanced technique for index partitioning:
Among other things it provides example code for automatic index (re-)creation.
Make sure that autovacuum
is running properly. The huge gain by CLUSTER
you have reported may be due in part to the implicit vacuuming that you get from CLUSTER
. Maybe this is set up by Heroku automatically, not sure.
The settings in your question look good. So that's probably not an issue here and CLUSTER
really was that effective.
has finally matured in Postgres 12. I would consider using that now instead of manual index partitioning (or at least additionally). Range partitioning with updated_at
as the partition key. There are also multiple improvements to general performance, big data and B-tree index performance in particular.
Best Answer
The naive approach would be to wrap your query around another
SELECT
and usereplace
.And do the same for
>
. A easier and more elgant mthod would be to to it with the language that should deliver/process/... the XML.