If it were me I would create a SQL Agent job for cycling the error log based on size or date. Run it a few times a day and you can even set up alerts to notify if the error log was cycled.
CREATE TABLE ##Temptable
(
[Archive #] tinyint,
[Date] datetime,
[Log File Size (Byte)] INT
)
insert into ##Temptable exec xp_enumerrorlogs
IF (
SELECT [Log File Size (Byte)] from ##Temptable where [Archive #] = 0
) > 5242880 --5MB
BEGIN
Exec sp_cycle_errorlog
END
drop table ##Temptable
You could also change the IF statement or add an OR statement to look at the date and cycle if it is older than 2 weeks or however long you want.
IF (
SELECT [Date] from ##Temptable where [Archive #] = 1
) < GETDATE() -14
EDIT: Archive for the date based cycle needs to be the previous error log. Log 0 will always return current date, since it is active, duh!
As for moving them, you can keep up to 99 log files. Do you think your extended information needs will exceed 100 logs? If not, I would move the error log location to a disk that can hold 100 full logs and change the maximum number of logs.
If I need to check if a user has voted in a poll already, I have to look up the current poll ID and user ID
SELECT COUNT(*) FROM votes WHERE user_id = 16 AND poll_id = 7 ;
This would be efficient with an index on (user_id, poll_id)
. But if you don't need the count but just whether a user has taken a poll, you only need an EXISTS
subquery. Either SELECT EXISTS (...)
or WHERE EXISTS (...)
, depending on what you want to do with this check:
... EXISTS (SELECT * FROM votes WHERE user_id = 16 AND poll_id = 7)
It will be a bit more efficient than the COUNT()
query - assuming that the above index has been added.
Is it bad design that it has to go through thousands of rows to find that?
No, if you have the index, it won't go through thousands or millions of rows. It will do a single index seek.
Would it be better to have a smaller table between the polls table and the votes, for checking if a user has voted in a poll, all the polls the user has voted in, etc.?
It might be better, yes. The smaller (in number of rows) table means that the indexes will be smaller, too. So you would use an EXISTS
subquery on a smaller index. For the specific query, the difference in efficiency would be very small though. For different queries, say "How nay users have taken this poll?" or *"How many users have taken each poll?", you'd get larger benefit as they require a scan of the whole index.
Or would this have no performance benefit over my current design?
So, it depends on what kind of queries you have.
Would look something like this I guess
+------+---------+---------+
| id | poll_id | user_id |
+------+---------+---------+
| 1 | 7 | 16 |
| 2 | 7 | 20 |
| 3 | 8 | 16 |
| 4 | 8 | 2 |
Then I'd add the id from this new vote_item table to the votes table as a FK (?) so I can grab the vote_item id via the user_id and poll_id, and return all the rows in the votes table with that vote_item_id.
Not exactly. The table only needs user_id
and poll_id
and a UNIQUE
constraint on (poll_id, user_id)
. The id
is useless for this many-to-many table. In most many-to-many tables, it's common to have two unique indexes, on (a,b) and (b,a). So, I suggest you have that (poll_id, user_id)
as the primary key and a unique index on (user_id, poll_id)
in the new poll_users
table, if you decide to add this table.
You are right about the FOREIGN KEY
though. You would add a foreign key from votes (poll_id, user_id)
that REFERENCES poll_users (poll_id, user_id)
and remove the individual foreign keys from votes
to polls (poll_id)
and users (user_id)
.
By the way the id
in the votes
looks useless, too. I'd have a UNIQUE
constraint on (poll_id, user_id, poll_option_id)
(meaning: no user can answer the same poll option in a poll twice) and throw away that id
.
Best Answer
Adding an example to Michael's answer. The problem with the PIVOT is two fold. 1st, it wants to aggregate. You can get around this by defining your dataset to be distinct and using MAX or MIN functions. But your example above makes that impossible, due to a user being able to have multiple answer sets for a given question. You would end up with only one row with the two datasets aggregated for the dupe user. To get around this I added a "question_set" field, adding a date (distinct enough for this example) to each question/answer set.
2nd, you still have to define the pivoted fields. If you have 50 questions, that's 50 definitions. And if you add a question, you will have to add the definition to the query. Using a loop, I created a question list dynamically and inserted it into the pivot query. Hope this helps.