Leading wildcard in LIKE
is the main problem. Another problem is that the WHERE
clause is spread across 3 tables.
Consider using a FULLTEXT
index; this will make looking for "words" much faster. Alas, you may be stuck with LIKE
in the 20-char part number.
Still the structure of the query will need to be changed...
SELECT p.id, p.manufacturerId, m.code as manufacturerCode, p.materialNumber,
p.partNumber, p.materialDescription, c.title as manufacturerTitle
FROM (
( SELECT id
FROM craft__parts
WHERE partNumber like '%polymer%'
)
UNION DISTINCT
( SELECT id
FROM craft__parts
WHERE materialDescription like '%polymer%'
)
UNION DISTINCT
( SELECT p3.id
FROM `craft__parts` AS p3
JOIN `craft__manufacturers` AS m3 ON p3.manufacturerId = m3.id
JOIN `craft_content` AS c3 ON m3.code = c3.field_manufacturerCode
WHERE title like '%polymer%'
)
) AS p
JOIN `craft__manufacturers` AS m ON p.manufacturerId = m.id
JOIN `craft_content` AS c ON m.code = c.field_manufacturerCode
ORDER BY c.title asc
The goal there is to
- Do each of the awful searches separately.
- Combine the results (UNION DISTINCT ... p.id)
- Then look up the rest of the fields
Step 1 still involves a full table scan (if using LIKE) or a quick FULLTEXT lookup (much better).
Steps 2 and should be relatively fast, assuming there were not too many matching rows.
The problem with the original query is all the bulky stuff hauled around while doing the JOINs.
Other notes...
A PRIMARY KEY
is a KEY
; do not redundantly say KEY(id)
.
Does m.code really need to be bigger than 255 characters? "Prefix" indexes are virtually useless, and may make the JOIN
in the third UNION
not run very fast.
To address this problem, I did the following (I'm using PostgreSQL for this, but SQL Server has CTE
s and RECURSIVE CTE
s and window/analytic functions. I doubt if this can be done in MySQL - maybe using variables - not sure). PostgreSQL is as powerful as the big boys - MySQL is really a toy!
Created a table:
CREATE TABLE stock (sku int, datein date, dateout date, status int);
Populated it thus:
INSERT INTO stock VALUES (123, '2015-02-01', '2015-02-05', 1);
INSERT INTO stock VALUES (123, '2015-02-10', '2015-02-10', 1);
INSERT INTO stock VALUES (123, '2015-02-20', NULL, 0);
Then, I constructed the first part of my query. The actual dates to and from will obviously vary depending on the requirements - I assumed that (from the sample data), the OP wanted to look at the month of February. I constructed the following RECURSIVE CTE
to construct a list of the days in February - PostgreSQL has the nifty generate_series function, but that's proprietary and I wanted to make my query portable (to the greatest extent possible). This is the best explanation of RCTE
s I came across.
WITH RECURSIVE dates (test_date) AS
(
SELECT '2015-02-01'::DATE
UNION ALL
SELECT test_date + 1 FROM dates
WHERE test_date < '2015-02-28'
)
Now, I thought that the organisation of the raw data was poor, so I used CTE
s to rearrange the data.
Get the outdates from the raw data:
out_dates AS
(
SELECT dateout AS dout FROM stock
)
Get the indates from the raw data:
in_dates AS
(
SELECT datein AS din FROM stock
)
Then I "transpose" the data - have to use two steps because one cannot use a Window function in a WHERE
clause
transpose AS
(
SELECT lag(dateout) over() AS datein, datein AS dateout FROM stock
-- WHERE lag(datein) over() IS NOT NULL AND dateout IS NOT NULL <<-- can't use here.
ORDER BY dateout, datein
)
And then
ranges AS
(
SELECT * FROM transpose
WHERE datein IS NOT NULL AND dateout IS NOT NULL
),
The result of this query is:
datein|dateout
2015-02-05|2015-02-10
2015-02-18|2015-02-20
Note that this corresponds to the in and out dates, but now the data is much simpler to handle because it's structure is more logical - it's chronologically coherent - thanks to the use of the LAG
window function.
Now, we bring it all together with the next query:
x AS
(
SELECT test_date, CASE WHEN has_match THEN 1 ELSE 0 END::int AS flag
FROM
(
SELECT test_date,
EXISTS
(
SELECT 1
FROM ranges
WHERE dates.test_date BETWEEN ranges.datein AND ranges.dateout
) AS has_match
FROM dates
) range_checks
)
SELECT * FROM x;
This gives a series of records for the month that are of this form:
test_date|flag
--------------
2015-02-01|0
2015-02-02|0
2015-02-03|0
2015-02-04|0
2015-02-05|1
2015-02-06|1
<other records snipped for brevity>
or use this as the final query
z AS
(
SELECT * FROM dates d
LEFT JOIN ranges r
ON (d.test_date BETWEEN r.datein AND r.dateout)
)
SELECT * FROM z;
for records of this form:
test_date|datein|dateout
2015-02-01||
2015-02-02||
2015-02-03||
2015-02-04||
2015-02-05|2015-02-05|2015-02-10
2015-02-06|2015-02-05|2015-02-10
<rest of the records snipped for brevity>
Best Answer
I wouldn't use a
JOIN
. If you would like to look for rows that have a set of columns in common (in your case, all but one), I would justGROUP
the rows by the common columns. Then, restrict which groups you want to return (using aHAVING
clause):The
min(E)
part is just a trick to be "SQL compliant".max(E)
, orfirst(E)
(for the databases that have it) would do the job as well. As you're choosing just groups with a single row, themin
value of a column within the group is the single value there is. In mySQL 5.6 and earlier, leavingE
as a column would work as well; although this is not SQL standard and doesn't work in 5.7 (thanks to ypercubeᵀᴹ for pointing the fact that this doesn't work on 5.7).Results are:
You can get everything at this SQL Fiddle
You can also take a different approach. Look for rows that literally comply with what you're looking for: that is, it does NOT exist another row with the same
A
,B
,C
andD
and differentE
. In this case, use anEXISTS
condition in yourWHERE
clause:Older versions of MySQL would probably not handle this query too well, because
EXISTS
conditions are not well optimized. mySQL 5.7 does a decent job; and so do most other databases (Oracle, MS SQL Server, PostgreSQL, ...) . You do need an index on(A, B, C, D, E)
(or, at least,(A, B, C, D)
to be really performant.SQL Fiddle
The two queries will not work the same in the case where you have two exact duplicate rows (which I interpret is a case not allowed by your spec). The first one will return no rows, whereas the second one will return both rows.
MySQL 5.6 Schema Setup:
For your extended case, the second query is your starting point.
Change your equalities with whichever conditions meet your needs.
For instance, if your
D
columns is a timestamp, and you want to consider that "two times are equivalent if the difference is less than 2 minutes"... then changeby
The possibility of nulls (that work as "jokers") should be handled with:
If E is nullable, then the comparison is done with
which is mySQL equivalent of
(t1.E is distinct from t0.E)
This will give you:
dbfiddle here
Don't expect this kind of queries to be efficient in realistic scenarios.