Mysql – Which is better: more rows or more columns in a student attendance table

MySQLPHP

I want to create an attendance table for student attendance. Which structure should I go for?

I have two options.

  1. attendance(student_id,date,is_absent,reason)
  2. attendance(student_id,d1,d2,d3,d4,...,d30,d31,is_absent,reason)

Scenario: assuming 100 students in which 50 are absent all 31 days for about 5 months. In case 1 with the assumed scenario the number of total rows is 50 absent * 31days * 5month =7750
At the same time case 2 will create 100 rows irrespective of number of absentees and the respective data is filled if the student is absent. So in total in case 2 100 * 5 months = 500 rows and 500 * 31 cols = 15500 cells (scenario doesn't matter in rows and cols but the number of cells filled is 50 absent * 31 days * 5 month = 7750)

Which one should I go for and would it be fast in querying when I query for attendance report or anything similar?
PS: MySQL as database and queries are sent from PHP

Best Answer

I would go for the first option, mostly because I believe that less columns are better.

But also:

First of all, the first option would only require an insert when a student is absent, while the second would, first, have an insert for each student at each month, and then for each student that is absent, it would require to update the row each time the student is absent.

The second option would however give a good matrix directly in the database which days a student is absent, but that would be easily achieved through PHP.

Also, having one row per day a student is absent, allows for better statistics when it comes to reporting.

For instance, check how many students that were absent during one month:

select month(date), count(*) from attendance where is_absent = 1 group by month(date);

compared to (here I assume that d1..d31 is an integer with either 1 or 0)

select month(date), sum(d1+d2+d3+d4+d5..+d31) from attendance group by month(date);

Or, if you have a student table as well (which I assume you will have) and you want to see which students were not absent:

select student_name
    from students
    where student_id not in (select student_id from attendance)

Compared to:

select student_name
    from students s 
        join attendance a on s.student_id = a.student_id
    where (d1+d2+d3+d4+d5..+d31) = 0

Finally, since you wanted to store the reason for the student to be absent, it means for option 2 you would either, have one more row for each reason or have one more column for each day which stores the reason, since otherwise you could only store one reason per month, unless you store the reason as a concatenated string in the column which would require a separator and thus limit the characters you put in as reason.

However, in the end it is more of a personal preference, since with correctly created indexes, I doubt that you would have to worry about the performance, but depending on the amount of absent students, option 2 would be using more storage than option 1 (number of columns * rows * size of the data types)