Sql-server – Dynamic SELECT and place result in variable columns

dynamic-sqlperformancesql-server-2008-r2

Edit: I use Microsoft SQL Server Management Studio, I believe it's SQL 2008R2. Sorry, new to this.

I've got a complex question, at least, I think so since I can't find the answer after excessive googling.

Simplified situation:

Table | QuestionAnswers
id
user_id
question_id
answer

Table | Questions
id
question_id
question_name

Now I want to write a query that gives the following result:

user_id | question 1 | question 2 | question 3 | que...etc
1         Something
1                                   Something
2                      Something       
3         Something                
3                      Something

Or (but that shouldn't be too difficult if I have the above?):

user_id | question 1 | question 2 | question 3 | que...etc
1         Something                 Something
2                      Something
3         Something    Something

Where 'something' is the answer to the question.

Now I have a possible query:

SELECT [user_id]    
      ,[answer] as Question1
      ,''       as Question2
      ,etc
FROM [QuestionAnswers]    
WHERE [question_id] = 1

UNION ALL

SELECT [user_id]    
       ,''       as Question1
       ,[answer] as Question2
       ,etc
FROM [QuestionAnswers]     
WHERE [question_id] = 2

UNION ALL

SELECT etc...

Works like a charm. Downside is that this way, I have to make x SELECT statements with x rows, where x is the number of questions I have (50+).

How can this be done dynamically? Is that possible?

I hope I explained myself good enough…

Kind regards,
Tjab

Best Answer

Adding an example to Michael's answer. The problem with the PIVOT is two fold. 1st, it wants to aggregate. You can get around this by defining your dataset to be distinct and using MAX or MIN functions. But your example above makes that impossible, due to a user being able to have multiple answer sets for a given question. You would end up with only one row with the two datasets aggregated for the dupe user. To get around this I added a "question_set" field, adding a date (distinct enough for this example) to each question/answer set.

2nd, you still have to define the pivoted fields. If you have 50 questions, that's 50 definitions. And if you add a question, you will have to add the definition to the query. Using a loop, I created a question list dynamically and inserted it into the pivot query. Hope this helps.

--> Create the test data
if OBJECT_ID('QuestionAnswers','u') is not null
    drop table QuestionAnswers

create Table  QuestionAnswers
(
id int identity,
question_set date, --> need something to delineate repeats
user_id varchar(255),
question_id int,
answer varchar(255)
)

if OBJECT_ID('Questions','u') is not null
    drop table Questions

create Table  Questions
(
id int identity,
question_id int,
question_name varchar(255)
)
go

with CTEquestion
as
(
    select 1 QID
    union all
    Select QID+1
    from CTEquestion
    where QID < 11
)
insert questions
select QID, 'Question'+cast(QID as varchar(50))
from CTEquestion


insert QuestionAnswers
values ('2015-04-23', 'a1', 1, 'Canswer1')
    ,  ('2015-04-23', 'a1', 2, 'Ianswer2')
    ,  ('2015-04-23', 'a1', 3, 'Canswer3')
    ,  ('2015-04-23', 'a1', 4, 'Canswer4')
    ,  ('2015-04-23', 'a1', 5, 'Ianswer5')
    ,  ('2015-04-23', 'a1', 6, 'Ianswer6')
    ,  ('2015-04-23', 'a1', 7, 'Canswer7')
    ,  ('2015-04-23', 'a1', 8, 'Canswer8')
    ,  ('2015-04-23', 'a1', 9, 'Canswer9')
    ,  ('2015-04-23', 'a1', 10,'Canswer10')

insert QuestionAnswers
values (CONVERT(DATE, GETDATE()), 'b2', 1, 'Canswer1')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 2, 'Canswer2')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 3, 'Canswer3')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 4, 'Canswer4')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 5, 'Canswer5')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 6, 'Ianswer6')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 7, 'Canswer7')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 8, 'Canswer8')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 9, 'Canswer9')
    ,  (CONVERT(DATE, GETDATE()), 'b2', 10, 'Ianswer10')

insert QuestionAnswers
values (CONVERT(DATE, GETDATE()), 'c3', 1, 'Ianswer1')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 2, 'Ianswer2')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 3, 'Canswer3')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 4, 'Ianswer4')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 5, 'Canswer5')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 6, 'Ianswer6')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 7, 'Canswer7')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 8, 'Canswer8')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 9, 'Canswer9')
    ,  (CONVERT(DATE, GETDATE()), 'c3', 10, 'Ianswer10')

insert QuestionAnswers
values (CONVERT(DATE, GETDATE()), 'a1', 1, 'Canswer1')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 2, 'Ianswer2')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 3, 'Canswer3')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 4, 'Canswer4')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 5, 'Canswer5')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 6, 'Canswer6')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 7, 'Canswer7')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 8, 'Canswer8')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 9, 'Canswer9')
    ,  (CONVERT(DATE, GETDATE()), 'a1', 10, 'Ianswer10')
-->End test data creation

--straight join
select qa.user_id, qa.question_set, q.question_id, qa.answer
from Questions q
    join QuestionAnswers qa on qa.question_id=q.question_id
order by qa.user_id


--dynamic pivot 
DECLARE 
    @questionList varchar(max)
, @maxQID int
, @qid int

select @questionList='',@maxQID = MAX(question_id), @qid= MIN(question_id)
FROM Questions

while @qid <= @maxQID
begin
    set @questionList=@questionList+'['+cast(@qid as varchar(10))+']'

    select @qid=min(question_id)
    from Questions
    where question_id > @qid

    if @qid<=@maxQID
        set @questionList=@questionList+', '
end

DECLARE @SQL NVARCHAR(MAX)

SET @SQL = N'
select user_id, '+@questionList+'
from
(select q.question_id, qa.question_set, qa.user_id, qa.answer
from Questions q
    join QuestionAnswers qa on qa.question_id=q.question_id) x
PIVOT
(
max(answer)
FOR question_id in ('+@questionList+')
) pvt
order by user_id'

exec sp_executesql @SQL

Related Solutions

SQL Server 2008 R2 – Cycling Error Logs Jobs

If it were me I would create a SQL Agent job for cycling the error log based on size or date. Run it a few times a day and you can even set up alerts to notify if the error log was cycled.

CREATE TABLE ##Temptable 
(
[Archive #] tinyint,
[Date] datetime,
[Log File Size (Byte)] INT
)
insert into ##Temptable exec xp_enumerrorlogs
IF (
    SELECT [Log File Size (Byte)] from ##Temptable where [Archive #] = 0
    ) > 5242880 --5MB
    BEGIN
    Exec sp_cycle_errorlog
    END
drop table ##Temptable

You could also change the IF statement or add an OR statement to look at the date and cycle if it is older than 2 weeks or however long you want.

IF (
    SELECT [Date] from ##Temptable where [Archive #] = 1
   ) < GETDATE() -14

EDIT: Archive for the date based cycle needs to be the previous error log. Log 0 will always return current date, since it is active, duh!

As for moving them, you can keep up to 99 log files. Do you think your extended information needs will exceed 100 logs? If not, I would move the error log location to a disk that can hold 100 full logs and change the maximum number of logs.

Mysql – ny benefit to this additional table

If I need to check if a user has voted in a poll already, I have to look up the current poll ID and user ID
   SELECT COUNT(*) FROM votes WHERE user_id = 16 AND poll_id = 7 ;

This would be efficient with an index on (user_id, poll_id). But if you don't need the count but just whether a user has taken a poll, you only need an EXISTS subquery. Either SELECT EXISTS (...) or WHERE EXISTS (...), depending on what you want to do with this check:

... EXISTS (SELECT * FROM votes WHERE user_id = 16 AND poll_id = 7)

It will be a bit more efficient than the COUNT() query - assuming that the above index has been added.

Is it bad design that it has to go through thousands of rows to find that?

No, if you have the index, it won't go through thousands or millions of rows. It will do a single index seek.

Would it be better to have a smaller table between the polls table and the votes, for checking if a user has voted in a poll, all the polls the user has voted in, etc.?

It might be better, yes. The smaller (in number of rows) table means that the indexes will be smaller, too. So you would use an EXISTS subquery on a smaller index. For the specific query, the difference in efficiency would be very small though. For different queries, say "How nay users have taken this poll?" or *"How many users have taken each poll?", you'd get larger benefit as they require a scan of the whole index.

Or would this have no performance benefit over my current design?

So, it depends on what kind of queries you have.

Would look something like this I guess

    +------+---------+---------+
    | id   | poll_id | user_id |
    +------+---------+---------+
    | 1    | 7       | 16      |
    | 2    | 7       | 20      |
    | 3    | 8       | 16      |
    | 4    | 8       | 2       |

Then I'd add the id from this new vote_item table to the votes table as a FK (?) so I can grab the vote_item id via the user_id and poll_id, and return all the rows in the votes table with that vote_item_id.

Not exactly. The table only needs user_id and poll_id and a UNIQUE constraint on (poll_id, user_id). The id is useless for this many-to-many table. In most many-to-many tables, it's common to have two unique indexes, on (a,b) and (b,a). So, I suggest you have that (poll_id, user_id) as the primary key and a unique index on (user_id, poll_id) in the new poll_users table, if you decide to add this table.

You are right about the FOREIGN KEY though. You would add a foreign key from votes (poll_id, user_id) that REFERENCES poll_users (poll_id, user_id) and remove the individual foreign keys from votes to polls (poll_id) and users (user_id).

By the way the id in the votes looks useless, too. I'd have a UNIQUE constraint on (poll_id, user_id, poll_option_id) (meaning: no user can answer the same poll option in a poll twice) and throw away that id.

Best Answer

Related Solutions

SQL Server 2008 R2 – Cycling Error Logs Jobs

Mysql – ny benefit to this additional table

Related Question