Sql-server – How to avoid duplicating information in a series of nested SELECT statements

selectsql serversql-server-2005

I'm working on an SQL query that puts together customer history based on past purchases (the industry is venue ticketing). The report will be LARGE, something like 80-90 columns, with one column per event or category of event per record in the output CSV.

The database structure requires that I use nested select statements to get the necessary data into columns for each record–each column containing a quantity of tickets for that type of event. I know how to do this in theory, but each embedded select statement is so large that the full report would blow through the 8,000 character limit on my DB interface. One of these statements looks like this:

(select count (*)
FROM
guest ig, event2 e, eventseat es, "order" o
WHERE ig.guestid = g.guestid
and ig.guestid = o.guestid
and o.orderid = es.orderid
and e.eventid = es.eventid
and e.incometype = 'T'
and e.eventtype in ('SS', 'BMF')
and es.status in ('2','4')
and es.price <> '0.00'
and es.price <> '5.00'
and e.year = '2010'       <-- THESE LINES ARE THE ONLY ONES THAT 
and e.run in ('SS-DANCE') <-- CHANGE PER SELECT STATEMENT 
) as 'SS-Dance',

That's about 400 characters, which only leaves enough room for <20 iterations in a single query without going over the character limit.

Is there a way to reduce this or generalize it so that I don't need to waste all those characters on duplicated information in each SELECT?

I've put together an example at the following link which contains a simplified version of my DB structure and some sample data. Objective is to get the same result data without using so many characters per nested SELECT.

http://sqlfiddle.com/#!3/fc316/2

Best Answer

;WITH x AS 
(
  SELECT g.guestid, e.[year], e.run, c = COUNT(*)
  FROM dbo.guest AS g
  INNER JOIN dbo.[order] AS o
  ON g.guestid = o.guestid
  INNER JOIN dbo.eventseat AS es
  ON o.orderid = es.orderid
  INNER JOIN dbo.event2 AS e
  ON e.eventid = es.eventid
  WHERE e.incometype = 'T'
  AND e.eventtype in ('SS', 'BMF')
  AND es.status in ('2','4')
  AND es.price NOT IN ( '0.00', '5.00')
  AND e.[year] = '2010'
  AND e.run IN ('SS-DANCE', 'SS-FILM', 'SS-OPERA')
  GROUP BY g.guestid, e.[Year], e.run
),
y AS 
(
  SELECT * FROM x PIVOT (MAX(c) FOR run IN 
  ([SS-DANCE],[SS-FILM],[SS-OPERA])) AS z
)
SELECT y.guestid, y.[year], g.[first], g.[last], 
  [SS-DANCE] = COALESCE(y.[SS-DANCE], 0), 
  [SS-FILM]  = COALESCE(y.[SS-FILM], 0), 
  [SS-OPERA] = COALESCE(y.[SS-OPERA], 0)
FROM y 
INNER JOIN dbo.guest AS g
ON y.guestid = g.guestid;

I'm not sure if my where clause is going to work for you as written. This currently assumes you only care about these three types (dance, film and opera) and you only care about the year 2010. If you need to cross years you can remove that WHERE clause. If you only care about dance in 2010 and film in any year you'll need to write a more complex set of where clauses.

Some comments:

  • try to avoid reserved words like order, year, first and last. These only complicate matters because they need to be enclosed in square brackets (which are preferable over "double quotes" mostly for readability and unambiguity).
  • always use dbo. (or whatever schema prefix is appropriate).
  • as I suggested above, stick to explicit INNER JOIN syntax. FROM x, y, z is for the birds.