Considering that the question is labeled database-design
I will offer a solution without that CSV field. Even if you decide to keep it, this may offer some ideas.
Question
contains all possible questions.
Answer
has all possible answers (to any question)
QuestionAnswer
contains all QA combinations offered, note that there must exist a default answer for "I do not know".
UserQA
contains questions user answered.
- Note that rule "There can be one and only one correct answer for each question" (IsCorrect = 'Y'), has to be implemented on the application level or as a stored procedure/trigger.
To get questions that a user with specific_UserId
has not yet answered
select qq.QuestionID
from Question as qq
where not exists (select 1
from UserQA as xx
where xx.QuestionID = qq.QuestionID
and xx.UserID = specific_UserId );
Or, better if there may be some new questions which do not have defined answers yet
select distinct qq.QuestionID
from QuestionAnswer as qq
where not exists (select 1
from UserQA as xx
where xx.QuestionID = qq.QuestionID
and xx.UserID = specific_UserId );
I concur with your speculation that it's bad design practice. I'd recommend against it. Any time you do this sort of thing, it almost inevitably comes back to bite and the performance difference compared to a properly designed schema should be small.
You say that as it stands now, your process "would involve searching through the first table for all question_ids answered by the user, then searching the topics_and_questions table" but that's not really a good analysis of what should happen behind the scenes.
Properly designed, this process would involve checking an index on the first table for all questions answered by the user, joining to the second table to find the topics of the questions (and possibly a third table to find the names of the topics), but all in one query and with no full table scans.
If your first table (no table name provided) has (user_id, question_id) as its primary key (as it probably should), then finding the questions answered by each user will be a fast operation, no matter how many records are in this table. Rows in InnoDB are stored in primary key order, so there's no "searching though" the entire table that needs to happen... the storage engine can go directly to where that user's records are without a full table scan. If (user_id,question_id) isn't the primary key, then adding an index on (user_id,question_id) will optimize this part of the query.
In topics_and_questions
either (question_id) or (question_id,topic_id) should probably be the primary key, depending on whether a question can be in multiple topics. If it can, of course, then counting questions answered becomes more complicated.
Since you appear to be allowing a user to answer the same question more than once, there may be yet another issue to consider: you actually have to duplicate part of what you're trying to avoid, every time you update this new proposed table: You have to check whether the question just answered was one that's been answered before, or not, and only increment this counter, if it is, I would assume.
Best Answer
Your usernames are to be unique. Good, they should be. But now you seem to think that
Username
would make a good primary key. Not necessarily. One important characteristic of a PK is stability. Unless you wish to implement a "one you've chosen a username you're stuck with it forever" policy (and that is a decidedly user-unfriendly policy), someone may want to change their username. Doing so when that username is scattered all over the database is going to be difficult.I am not a fan of design rules that demand a surrogate key for every table. But typically 50% or more of tables should have surrogate keys. This is one of them.
And I certainly hope that when you say the login table will contain the password, you do mean the hash of the password, don't you?
@ypercube speaks for all of us about the "for each question, there will be separate table" idea. Not good.
Now as for the user's code. For that you have choices. You don't mention the target DBMS (if you even have one at this point) but some handle
text
orclob
fields better than others. The question of handling actual disk files is one you will have to research yourself. Will those files reside in a file server, within a version control system, just a dedicated folder on some disk somewhere? In any case, what are the OS-based and/or network access privileges that will be needed? You get the idea. There are a lot of questions unrelated to databases that must be answered.Not to say that one is superior to the other, but unless you or someone on your team has done this before, you may want to start with storing the code in the database. That will probably get you started faster and you can move to system files (if you determine it to be the better option) more or less at your leisure.