How to get a list of values for column x GROUPed BY column y

group byoracleoracle-12c

I have the following table structure and data (using Oracle DB 12c):

CREATE TABLE authors (
    aid NUMBER(38) GENERATED ALWAYS AS IDENTITY CONSTRAINT authors_pk PRIMARY KEY NOT NULL,
    fname VARCHAR2(100 CHAR) NOT NULL,
    sname VARCHAR2(150 CHAR) NOT NULL,
    dob NUMBER(4),
    gender CHAR(1 CHAR)
);

CREATE UNIQUE INDEX authors_uind ON authors (fname, sname, dob, gender); --- Combination of columns must be unique

INSERT INTO authors(sname, fname, dob, gender) VALUES('Codd', 'Edgar F', 1923, 'M');
INSERT INTO authors(sname, fname, dob, gender) VALUES('Date', 'Chris J', 1941, 'M');
INSERT INTO authors(sname, fname, dob, gender) VALUES('Darwin', 'Hugh', 1943, 'M');
INSERT INTO authors(sname, fname, dob, gender) VALUES('Lions', 'John', 1937, 'M');

CREATE TABLE publications (
    pid NUMBER(38) GENERATED ALWAYS AS IDENTITY CONSTRAINT publications_pk PRIMARY KEY NOT NULL,
    title VARCHAR2(150 CHAR) NOT NULL,
    written NUMBER(4)
);
CREATE INDEX publications_ind ON publications (title, written);

INSERT INTO publications(title, written) VALUES('A Relational Model of Data for Large Shared Data Banks', 1970);
INSERT INTO publications(title, written) VALUES('The Relational Model for Database Management', 1990);
INSERT INTO publications(title, written) VALUES('An Introduction to Database Systems', 2003);
INSERT INTO publications(title, written) VALUES('The Third Manifesto', 2000);
INSERT INTO publications(title, written) VALUES('Temporal Data and the Relational Model', 2002);
INSERT INTO publications(title, written) VALUES('Database in Depth: Relational Theory for Practitioners', 2005);
INSERT INTO publications(title, written) VALUES('Commentary on UNIX', 1976);

CREATE TABLE author_publications (
    aid NUMBER(38) REFERENCES authors (aid),
    pid NUMBER(38) REFERENCES publications (pid),
    CONSTRAINT author_publications_pk PRIMARY KEY (aid, pid)
);

INSERT INTO author_publications(aid, pid) VALUES(1, 1);
INSERT INTO author_publications(aid, pid) VALUES(1, 2);
INSERT INTO author_publications(aid, pid) VALUES(2, 3);
INSERT INTO author_publications(aid, pid) VALUES(2, 4);
INSERT INTO author_publications(aid, pid) VALUES(2, 5);
INSERT INTO author_publications(aid, pid) VALUES(2, 6);
INSERT INTO author_publications(aid, pid) VALUES(3, 4);
INSERT INTO author_publications(aid, pid) VALUES(3, 5);
INSERT INTO author_publications(aid, pid) VALUES(4, 7);

It's possible for one author to have written multiple books and one book to have been written by multiple authors.

What should the query look like so that I get a single row per book (for all books), with the names of the authors concatenated together. EG:

| Written | Title | PID | Authors |
|---------|-------|-----|---------|
| 1990 | THE THIRD MANIFESTO | 4 | DATE, DARWIN |
| 2002 | TEMPORAL DATA AND THE RELATIONAL MODEL | 8 | DARWIN, DATE |

Note 1: The order of the authors' names in the fourth column is not important, nor is a comma after the name of a single author in a row.

Note 2: The use of a GROUP BY clause is not required if the desired result can be achieved by a different means.

The query I have is as follows:

--- Get a list of author names per title
SELECT p.written, upper(p.title), p.pid, concat(upper(a.sname), ', ') 
FROM publications p INNER JOIN author_publications apub on p.pid = apub.pid
INNER JOIN authors a on apub.aid = a.aid
GROUP BY p.pid
order by p.written, upper(p.title)
;

The error message I get is "not a GROUP BY expression". (I don't understand what particular part of that statement fails or why.) The line in question is the start of the SELECT statement.

Best Answer

Instead of CONCAT, use LISTAGG:

select
  p.written, upper(p.title) as title, p.pid,
  upper(listagg(a.sname, ', ') within group (order by a.sname)) as authors
from publications p
join author_publications ap on p.pid = ap.pid
join authors a on ap.aid = a.aid
group by p.written, p.title, p.pid
order by p.written;

Result:

WRITTEN TITLE                                                         PID AUTHORS
------- ------------------------------------------------------------ ---- ---------------
   1970 A RELATIONAL MODEL OF DATA FOR LARGE SHARED DATA BANKS          1 CODD
   1976 COMMENTARY ON UNIX                                              7 LIONS
   1990 THE RELATIONAL MODEL FOR DATABASE MANAGEMENT                    2 CODD
   2000 THE THIRD MANIFESTO                                             4 DARWIN, DATE
   2002 TEMPORAL DATA AND THE RELATIONAL MODEL                          5 DARWIN, DATE
   2003 AN INTRODUCTION TO DATABASE SYSTEMS                             3 DATE
   2005 DATABASE IN DEPTH: RELATIONAL THEORY FOR PRACTITIONERS          6 DATE