How big is this database? How many rows are in each table? Etc?
I would say that normalized data is default state to try to obtain. It is a leaner database, rows are shorter, and indexes may be used more effectively. The short, leaner rows therefore lead to a smaller, leaner database.
One of the major accelerators of performance is memory. If you can get your 5 tables to remain cached in memory, that will be a performance accelerator for your queries since you will avoid much of the disk I/O overhead.
You identify that you are joining with IDs (which are usually integers), so your indexes may be narrow and offer relatively inexpensive joins.
If you decide to denormalize, your tables will be bigger because they are carrying more redundant data on every row. This causes a need for more memory to keep the data in cache and will require even more I/O when the cache is insufficient to buffer the data. (And your backups are bigger.)
In addition, you have taken on the task to denormalize and to maintain the denormalized data. This is an extra load of programming and on the server as well: consuming memory, I/O, and CPU.
But sometimes denormalization is the best choice. Data Warehouses, for example, are largely denormalized data. Also, you may find that in your system the benefits of denormalization may exceed the cost.
Still, you are asking a forum for an answer.
Your best answer would come by building normalized test case and seeing how it works. Even though you may not have a lot of 'real data', you should generate a fairly large data set in the millions of rows to test with.
You can try to find a tool that does it for you (RedGate has one for example, but it is not free) or generate the data yourself so that you control the complexity. There are online sources of states, cities, et cetera, and you can make up call centers, generate landline numbers and so forth.
Then try it.
If you do not like the performance, then create a denormalized table to test. And put some effort into writing the code to maintain the denormalization, since that will become integral to your process.
The problem is the subquery, where you combine in the select list both an aggregated (max(cr.posted_dt)
) and non-aggregated expressions/columns from the tables.
MySQL allows you to run this kind of inconsistent queries - with default settings - which basically means it depends on the developer to write consistent queries and not fire themselves on the foot. You can change the sql mode to ONLY_FULL_GROUP_BY
and see what happens if you try to run your query.
Now to solve the issue, it seems you want a [greatest-n-per-group]
type of query. There are several ways to do this, all quite complicated in MySQL (because it lacks window functions). Check the relevant tags in this site and in the SO main site.
Here's one way:
SELECT c.id,
c.title,
c.member_id,
cr.mem_id AS last_replier,
CONCAT(cr.posted_date, 'T', cr.posted_time) AS last_posted_dt
FROM complaints AS c
LEFT JOIN complaints_reply AS cr
ON cr.id =
( SELECT crl.id
FROM complaints_reply AS crl
WHERE crl.complaint_id = c.id
AND crl.mem_id <> c.member_id
ORDER BY posted_date DESC, posted_time DESC
LIMIT 1
) ;
It will give you consistent results and also be quite efficient, if you add an index on (complaint_id, posted_date, posted_time, mem_id)
Tested at SQLfiddle.
Best Answer
No difference. Only the order of the columns in the
SELECT
list changes.See related questions:
Changes like this will almost 100% produce the same execution plan.
Communtiy Wiki answer based on an original comment on the question by @ypercube