There are a few ways that you can perform this data transformation. You have access to the PIVOT
function then that will be the easiest, but if not then you can use an aggregate function and a CASE
.
Aggregate /Case version:
select personid,
max(case when optionid = 'A' then 1 else 0 end) OptionA,
max(case when optionid = 'B' then 1 else 0 end) OptionB,
max(case when optionid = 'C' then 1 else 0 end) OptionC
from PersonOptions
group by personid
order by personid;
See SQL Fiddle with Demo
Static Pivot:
select *
from
(
select personid, optionid
from PersonOptions
) src
pivot
(
count(optionid)
for optionid in ('A' as OptionA, 'B' OptionB, 'C' OptionC)
) piv
order by personid
See SQL Fiddle with Demo
Dynamic Version:
The two versions above work great if you have a known number of values, but if your values are unknown, then you will want to implement dynamic sql and in Oracle you can use a procedure:
CREATE OR REPLACE procedure dynamic_pivot_po(p_cursor in out sys_refcursor)
as
sql_query varchar2(1000) := 'select personid ';
begin
for x in (select distinct OptionID from PersonOptions order by 1)
loop
sql_query := sql_query ||
' , min(case when OptionID = '''||x.OptionID||''' then 1 else null end) as Option_'||x.OptionID;
dbms_output.put_line(sql_query);
end loop;
sql_query := sql_query || ' from PersonOptions group by personid order by personid';
dbms_output.put_line(sql_query);
open p_cursor for sql_query;
end;
/
Then you return the results, you will use:
variable x refcursor
exec dynamic_pivot_po(:x)
print x
The results are the same with all versions:
| PERSONID | OPTIONA | OPTIONB | OPTIONC |
------------------------------------------
| 1 | 1 | 1 | 0 |
| 2 | 0 | 0 | 1 |
| 3 | 0 | 1 | 0 |
| 4 | 1 | 0 | 1 |
Exclude users without emails
Assuming we only want users that actually have emails. Users without emails are ignored. The reason I went with this assumption at first is that all your queries do that already:
LEFT JOIN emails on users.id = emails.user_id
WHERE emails.email LIKE 'a' || '%%'
By adding a WHERE
condition on emails.email
you effectively convert your LEFT JOIN
to a plain [INNER] JOIN
and exclude users without emails. Detailed explanation:
2nd query rewritten
Your 2nd query does not work as advertised, results are not "descending by number of emails". You have to nest the result of count()
in another CTE or subquery and run dense_rank()
on it. You cannot nest window functions in the same query level.
SELECT u.name, e2.*
FROM (
SELECT *, dense_rank() OVER (ORDER BY n_emails, users.id) AS rnk
FROM (
SELECT user_id, id AS e_id, e_mail
, count(*) OVER (PARTITION BY user_id) AS n_emails
FROM emails
WHERE email LIKE 'a' || '%' -- one % is enough
) e1
) e2
JOIN users u ON u.id = e2.user_id
WHERE rnk < 3
ORDER BY rnk;
This should be fastest if the predicate is selective enough (selects only a small fraction of all emails). Two window functions with rows sorted differently have their price, too.
- A major point is to run the subquery on
emails
only - which is possible if the preliminary assumption holds.
3rd query improved
If, on the other hand, the predicate WHERE e.email LIKE 'a' || '%'
is not very selective, your 3rd query is probably faster, even if it reads from the table twice - but the second time only desired rows. Also improved:
SELECT e.user_id, u.name,
e.id AS e_id, e.e_mail, sq.n_emails
FROM (
SELECT user_id, count(*) AS n_emails
FROM emails
WHERE email LIKE 'a' || '%'
GROUP BY user_id
ORDER BY count(*) DESC, user_id -- break ties
LIMIT 2 OFFSET 0
) sq
JOIN emails e USING (user_id)
JOIN users u ON u.id = e.user_id
WHERE e.email LIKE 'a' || '%'
ORDER BY sq.n_emails DESC;
Include users without emails
You can either include the users table in the inner query again, similar to what you had before. But you have to pull the filter on email into the join condition!
SELECT u.name, e2.*
FROM (
SELECT *, dense_rank() OVER (ORDER BY n_emails, users.id) AS rnk
FROM (
SELECT u.id AS user_id, u.name, e.id AS e_id
, count(e.user_id) OVER (PARTITION BY u.id) AS n_emails
FROM users u
LEFT JOIN emails e ON e.user_id = u.id
AND e.email LIKE 'a' || '%' -- !!!
) e1
) e2
WHERE rnk < 3
ORDER BY rnk;
Which will be a bit more expensive.
Since you retrieve users with the most emails first, users without emails will rarely be in the result. To optimize performance, you could use a UNION ALL
with LIMIT
:
( -- parentheses required
SELECT u.name, e2.user_id, e2.e_id, e2.e_mail, e2.n_emails
FROM (
SELECT *, dense_rank() OVER (ORDER BY n_emails, users.id) AS rnk
FROM (
SELECT user_id, id AS e_id, e_mail
, count(*) OVER (PARTITION BY user_id) AS n_emails
FROM emails
WHERE email LIKE 'a' || '%' -- one % is enough
) e1
) e2
JOIN users u ON u.id = e2.user_id
WHERE rnk < 3 -- adapt to paging!
ORDER BY rnk
)
UNION ALL
(
SELECT u.name, u.user_id, NULL AS e_id, NULL AS e_mail, 0 AS n_emails
FROM users u
LEFT JOIN emails e ON e.user_id = u.id
AND e.email LIKE 'a' || '%'
WHERE e.e.user_id IS NULL
)
OFFSET 0 -- adapt to paging!
LIMIT 2 -- adapt to paging!
Detailed explanation:
I would consider materializing the result for two reasons:
- Subsequent queries are much faster.
- You don't have to operate on a moving target. You speak of paging, and if users get new emails between pages, your whole sort order may be moot.
Build a MV from the 2nd query without LIMIT
(REFRESH MATERIALIZED VIEW
), then return the first page etc. It's a matter of policy, when you refresh the MV again.
Best Answer
You can aggregate the emails into an array:
This can be used to join against the users table. To put the emails into columns, extract them from the array.
If you need more emails, add more expressions to the outer query. If an index position does not exist,
null
is returned instead (no error).It is not possible to write query that dynamically returns a different number of columns each time you call it. The number, type and name of all columns of a query must be known before the query starts running.