If all the tables use the MyISAM Storage Engine and have the same table structure, I have some good news for you.
You can create a single table that consumes no additional space except a .frm
file and some mapping info. The key is to take advantage of the MERGE (MRG_MyISAM) Storage Engine.
Here is how you can do this:
CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4) INSERT_METHOD=LAST;
Using this method, you can query the 4 tables at the same time like this:
SELECT ReqF FROM XMerge WHERE EmpName='John';
Was that simple, or what ???
In your case, you have 75 tables. You would do this:
CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4,X5,X6,X7,X8,X9,
X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,
X20,X21,X22,X23,X24,X25,X26,X27,X28,X29,
X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,
X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,
X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,
X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,
X70,X71,X72,X73,X74,X75) INSERT_METHOD=LAST;
SELECT ReqF FROM XMerge WHERE EmpName='John';
The beauty of this is that creating a MERGE table takes milliseconds. Just make sure every table has an index on EmpName. Better to do 75 indexed lookups that 75 full table scans. If there is no index on EmpName, you need to do this:
ALTER TABLE X1 ADD UNIQUE KEY (EmpName);
ALTER TABLE X2 ADD UNIQUE KEY (EmpName);
.
.
.
ALTER TABLE X75 ADD UNIQUE KEY (EmpName);
CREATE TABLE XMerge LIKE X1;
ALTER TABLE XMerge ENGINE=MRG_MYISAM
UNION=(X1,X2,X3,X4,X5,X6,X7,X8,X9,
X10,X11,X12,X13,X14,X15,X16,X17,X18,X19,
X20,X21,X22,X23,X24,X25,X26,X27,X28,X29,
X30,X31,X32,X33,X34,X35,X36,X37,X38,X39,
X40,X41,X42,X43,X44,X45,X46,X47,X48,X49,
X50,X51,X52,X53,X54,X55,X56,X57,X58,X59,
X60,X61,X62,X63,X64,X65,X66,X67,X68,X69,
X70,X71,X72,X73,X74,X75) INSERT_METHOD=LAST;
SELECT ReqF FROM XMerge WHERE EmpName='John';
Give it a Try !!!
First consider a query that computes which rows are actually relevant from tablethree
. With the assumption that with "most recently entered result" you mean "most recent enddate
" the following query would gather the appropriate rows:
SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
Now you can build a join to retrieve not only sid
, but all of the data of tablethree
:
SELECT a.*
FROM tablethree a
INNER JOIN (
SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) b
ON a.sid = b.sid AND a.enddate = b.enddate
This is the result set you actually want to "left join in". You have to insert this into your original query:
SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
ON t1.cid = t2.id
LEFT OUTER JOIN (
SELECT a.*
FROM tablethree a
INNER JOIN (
SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) b
ON a.sid = b.sid AND a.enddate = b.enddate
) AS t3
ON t3.sid = t2.sid
WHERE t1.fieldone = 1
AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate
What should also work is the following:
SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
ON t1.cid = t2.id
LEFT OUTER JOIN tablethree AS t3
ON t3.sid = t2.sid
LEFT OUTER JOIN (
SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) mostrecent
ON t3.sid = mostrecent.sid AND t3.enddate = mostrecent.enddate
WHERE t1.fieldone = 1
AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate
AND mostrecent.enddate IS NULL
This includes both tablethree
and the new SELECT
as left joins, and sorts out the rows where mostrecent.enddate IS NULL
(meaning those rows which are actually not most recent). This should lead to the same result, but MySQL may be able to compute this result a little faster. EXPLAIN
on both queries should reveal possible differences in computation.
Best Answer
I highly recommend that you don't go down that path. You are mixing business logic with relational querying, and that's a sure recipe for many headaches later on, especially in terms of performance.
Keep in mind the rule of '1 query for 1 task', and the simpler and more granular, the better. You are trying to write 1 query, to handle different tasks.
Use a business control layer either on the client side, a dedicated business tier, or even in a stored procedure on the server, and execute different dedicated, simple SQL queries based on your logic.
For example, in your case you could write a procedure that will accept both parameters and execute the right query based on the values provided. The procedure has negligible overhead, and you get the huge advantage of having separate, optimized query plans created for every combination.
UPDATE: If you need this to work for a large number of parameters, I highly recommend that you read this article by Erland Sommarskog. It is the best article that I know of for this type of challenge, and will offer you several solutions, with thorough analysis of the pros and cons of each. It's long, it's not an easy read, but it will be worth every minute you spend on it.
What worked well for me in previous similar cases, is a hybrid approach. I start with a generic solution, you will find a few options in the article. Then, I set up a trace, or a log table to record actual usage stats by the users and let it run for some time. Although product and marketing will say that all combinations are equally important, you will find that in real life, most users tend to use only a few and repeat the same pattern. Then, take the top used combinations, and write specific queries just for those, and use the generic query for the few instances that a unique combination of parameters is used. I've added the line of code to the example. This way you get the best of both worlds - for most executions, you will get excellent performance, and still support all possible combinations.
BTW - An interesting side effect of this approach, is that over time users start to notice that it takes much longer when they use the non-standard combinations, and then they either complain about it, so you can write a specific query for that case and make them happy, but usually they tend to just use the common combinations more, that are way faster :-)
HTH