If I guessed the missing pieces correctly, this query using a crosstab()
function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):
It requires that you install the additional module tablefunc
first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query
SELECT name, project_id, positions, created_day
,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
, array_agg(DISTINCT pr.position) AS positions
FROM project_reports pr
JOIN keywords k ON k.id = pr.keyword_id
GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
) sub
LEFT JOIN crosstab(
$$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
FROM project_reports
WHERE created_at::date IN (
current_date - 1
, current_date - 7
, (now() - interval '1 mon')::date
, (now() - interval '3 mon')::date
, (now() - interval '6 mon')::date
, (now() - interval '12 mon')::date
)
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
) ON t.kp[1] = keyword_id
AND t.kp[2] = project_id;
To break it down, a simple demo of what I am doing in the crosstab:
SELECT * FROM crosstab(
$$SELECT *
FROM (
VALUES
('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,2}', current_date - 7, '{5,6,7}'::int[])
, ('{1,3}', current_date - 7, '{5,6,7}'::int[])
, ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
-- 3 mon missing
, ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
-- 12 mon missing
) sub
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
);
Result:
kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
{1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
{1,3} | {1,2} | {5,6,7} | | | |
The rest should be obvious.
Comments
I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab()
is the tricky part.
I am not joining to project_keywords
, it seems to be just noise. I am suspecting problems in your relational model.
I seems you should include the project_id
in the result to make sense. Else you should not GROUP BY it ...
You can just subtract integer
from date
.
The index data order on disk for text columns depends on the locales provided by the underlying operating system.
The same locales (that is, with the same name) may differ between operating systems on the order rules, even on simple things. As an example this question:
PostgreSQL 9.1 streaming replication problem: replica fails to use an index properly
shows how "0102"
and 0102
sort differently on Ubuntu and FreeBSD.
The result in the physical index structure is that one value should come before the other in one OS and vice-versa in the other.
The problem with a recovered base backup is the same as with replication: the on-disk structure of the index is used as-is, resulting in a corrupted index in the target system if the column's locale doesn't sort exactly the same between systems.
It's mentioned in the comments that the origin system is CentOS and destination Mac OS X. Given that CentOS uses GNU libc and Mac OS X certainly a BSD-like libc, I'd think that there is no chance at all for on-disk postgres portability between these systems.
Best Answer
If any of the columns is
null
the result of the concatenation using||
is also null.You can use
concat()
or evenconcat_ws()
("concat With Separator") to make Postgres treat null values like empty strings:Note that
concat_ws()
will not append the separator if a value is "missing". If you do need it, you have to useconcat()
Details in the manual:
9.4. String Functions and Operators
Regarding the last part of your question, you can use a
select
statement for the\copy
command. For details please see the manual.