There is nothing wrong with nullable columns, even a lot of them, if that is what your data domain calls for.
On the other hand, the fact that you say these columns can be grouped logically implies to me that something else might be going on.
If they can be grouped logically because different sets of columns apply to different sets of rows, then you might have an entity-subtype situation.
Conversely, if columns can be grouped because they apply at different times, then you may have a normalization issue. For example, if your columns are something like "January Sales", "February Sales", etc. these should be rows in a child table.
While there is nothing innately wrong with nullable columns, neither is there anything wrong with joining. It's what RDBMS does for a living.
UPDATE:
Given additional information about the logical groups of columns:
There are two kinds of sub-typing that can be represented in a database using 1:1 relationships. If the logical groups were mutually exclusive, then the parent entity could have what is known as a partitioning attribute that tells you which one of the subtypes is applicable. However, without a partitioning attribute, it is possible to have zero, one or even multiple subtypes being applicable at the same time.
The same fundamental question applies then to what you do with this situation.
A good way to resolve it would be to look at the logical groups of columns. Are the columns in logical group A the same as in logical group B - or are the totally different? If they are different they might be best modeled in the single table with nullable fields. If they are the same, then this might be a clue that they should be multiple child rows instead.
Another clue to look at is whether it makes sense that a logical group of columns could take on a life of its own and start attracting relationships from other tables. If logical group B might sometime soon find itself with multiple child records from another table, then it might be a sign that it makes sense to promote that group to its own subtype style table.
One last thing to consider is physical implementation. If a logical subgroup is very sparsely populated, you might be able to make a case for segregating these columns into another table to optimize physical storage. This step shouldn't be done proactively. This kind of optimization should be done when performance testing proves it is necessary.
If none of these things are true, then you are probably best off leaving the nullable columns in the original table.
Please try:
(edited 2014-12-18: replaced Left
with INNER
join):
SELECT
`pages_users`.`page_id` ,
COUNT(`pu2`.`page_id`) AS `count`
FROM
`pages_users`
INNER JOIN `pages_users` AS `pu2`
ON `pu2`.`user_id` = `pages_users`.`user_id`
AND `pu2`.`page_id` = *PAGE_ID*
WHERE
`pages_users`.`page_id` != *PAGE_ID*
GROUP BY
`pages_users`.`page_id`
ORDER BY
`count` DESC
LIMIT 10
or as an alternative
SELECT
`pages_users`.`page_id` ,
COUNT(`pu2`.`pu2_page_id`) AS `count`
FROM
`pages_users`
INNER JOIN
(
SELECT
`pu2_intern`.`page_id` AS `pu2_page_id`
FROM
`pages_users` AS `pu2_intern`
WHERE
`pu2_intern`.`page_id` = *PAGE_ID*
) AS `pu2`
ON `pu2`.`user_id` = `pages_users`.`user_id`
WHERE
`pages_users`.`page_id` != *PAGE_ID*
GROUP BY
`pages_users`.`page_id`
ORDER BY
`count` DESC
LIMIT 10
You could also add another index just on the field page_id
:
KEY `page` (`page_id`)
I hope it could help you.
Best Answer
Answering strictly in terms of MySQL, you do not want to overnormalize.
Here is why ;
REASON #1 : JOIN BUFFER
There is an in-memory buffer called the join buffer. Its size is regulated in RAM by the join_buffer_size option per DB Connection. The more normalized the data are, the more
JOIN
clauses. In turn, the moreJOIN
clauses, the more join buffers have to be allocated. If a join buffer is too small, it gets migrated to disk. This slows down the query's execution byREASON #2 : QUERY EVALUATION
Because MySQL executes SQL through external storage engines, query evaluation for WHERE clauses and JOIN clauses are virtually identical. I wrote about this back on March 11, 2013 (Is there an execution difference between a JOIN condition and a WHERE condition?). The algorithm for JOINs are also in the MySQL Documentation. The JOIN algorithm can get rather wonky with multiple JOIN clauses.
REASON #3 : STORAGE ENGINE PERFORMANCE
Any storage engine in MySQL is basically comprised of 14 operations. Those operations, including JOINs, are layered in such a way that performance issues can occur in two layers
EPILOGUE
It is strictly up to you how deep you want to normalize your data. If you must normalize and do multiple JOINs, please join only integer values only. Joining on character values will have negative effects on the three reasons I just mentioned.