Mysql – Elegant Database Design

database-designMySQL

I'm trying to make "online Judge Environment"(like SPOJ). In which users have facility to see their past submissions for each question, their best code performance(like running time, if they succeed to answer correctly). User have their USENAME(which will be unique) and PASSWORD. Around 10,000 users are expected.

My Effort : there will be one 'login' table in which each column contains USERNAME and PASSWORD. For each question, there will be separate table, in which each column contains USERNAME, their respective code file, status(correct answer, compile error, runtime error,etc.), running time(if any). every question table may contain more than one entry for same USERNAME.

Is there any better design? Should I store code files(text files) in database itself, or somewhere in file system and just give location path of file in table?

Best Answer

Your usernames are to be unique. Good, they should be. But now you seem to think that Username would make a good primary key. Not necessarily. One important characteristic of a PK is stability. Unless you wish to implement a "one you've chosen a username you're stuck with it forever" policy (and that is a decidedly user-unfriendly policy), someone may want to change their username. Doing so when that username is scattered all over the database is going to be difficult.

I am not a fan of design rules that demand a surrogate key for every table. But typically 50% or more of tables should have surrogate keys. This is one of them.

And I certainly hope that when you say the login table will contain the password, you do mean the hash of the password, don't you?

@ypercube speaks for all of us about the "for each question, there will be separate table" idea. Not good.

Now as for the user's code. For that you have choices. You don't mention the target DBMS (if you even have one at this point) but some handle text or clob fields better than others. The question of handling actual disk files is one you will have to research yourself. Will those files reside in a file server, within a version control system, just a dedicated folder on some disk somewhere? In any case, what are the OS-based and/or network access privileges that will be needed? You get the idea. There are a lot of questions unrelated to databases that must be answered.

Not to say that one is superior to the other, but unless you or someone on your team has done this before, you may want to start with storing the code in the database. That will probably get you started faster and you can move to system files (if you determine it to be the better option) more or less at your leisure.

Related Solutions

Mysql – Database design – many to many

Considering that the question is labeled database-design I will offer a solution without that CSV field. Even if you decide to keep it, this may offer some ideas.

Question contains all possible questions.
Answer has all possible answers (to any question)
QuestionAnswer contains all QA combinations offered, note that there must exist a default answer for "I do not know".
UserQA contains questions user answered.
Note that rule "There can be one and only one correct answer for each question" (IsCorrect = 'Y'), has to be implemented on the application level or as a stored procedure/trigger.

enter image description here

To get questions that a user with specific_UserId has not yet answered

select  qq.QuestionID
from Question as qq
where not exists (select 1 
                    from UserQA as xx 
                   where xx.QuestionID = qq.QuestionID
                     and xx.UserID     = specific_UserId );

Or, better if there may be some new questions which do not have defined answers yet

select distinct qq.QuestionID
from QuestionAnswer as qq
where not exists (select 1 
                    from UserQA as xx 
                   where xx.QuestionID = qq.QuestionID
                     and xx.UserID     = specific_UserId );

Mysql – Database schema – performance v correctness information

I concur with your speculation that it's bad design practice. I'd recommend against it. Any time you do this sort of thing, it almost inevitably comes back to bite and the performance difference compared to a properly designed schema should be small.

You say that as it stands now, your process "would involve searching through the first table for all question_ids answered by the user, then searching the topics_and_questions table" but that's not really a good analysis of what should happen behind the scenes.

Properly designed, this process would involve checking an index on the first table for all questions answered by the user, joining to the second table to find the topics of the questions (and possibly a third table to find the names of the topics), but all in one query and with no full table scans.

If your first table (no table name provided) has (user_id, question_id) as its primary key (as it probably should), then finding the questions answered by each user will be a fast operation, no matter how many records are in this table. Rows in InnoDB are stored in primary key order, so there's no "searching though" the entire table that needs to happen... the storage engine can go directly to where that user's records are without a full table scan. If (user_id,question_id) isn't the primary key, then adding an index on (user_id,question_id) will optimize this part of the query.

In topics_and_questions either (question_id) or (question_id,topic_id) should probably be the primary key, depending on whether a question can be in multiple topics. If it can, of course, then counting questions answered becomes more complicated.

Since you appear to be allowing a user to answer the same question more than once, there may be yet another issue to consider: you actually have to duplicate part of what you're trying to avoid, every time you update this new proposed table: You have to check whether the question just answered was one that's been answered before, or not, and only increment this counter, if it is, I would assume.

Best Answer

Related Solutions

Mysql – Database design – many to many

Mysql – Database schema – performance v correctness information

Related Question