MySQL SELECT and SORT performance – v. large table

MySQLselectsorting

I have a table that contains in the region of 4 million rows. This is searched using a form, and then the results are returned. However, the query seems to take an inexplicably large amount of time, and I'm hoping that there's a more efficient query to run.

SELECT * 
FROM Example 
WHERE STR_TO_DATE(IncorporationDate,'%d/%m/%Y') > '" . $from . "' 
  AND STR_TO_DATE(IncorporationDate,'%d/%m/%Y') < '" . $to . "' 
  AND PostCode REGEXP  '" . $postcode . "' 
  AND `Returns.LastMadeUpDate`='' 
ORDER BY PostCode ASC

Best Answer

As mentioned in comments (it is really answer for original question) - using function in WHERE - prevent any indexes using.

First of all, You can check - how many of Your queries use form

SELECT * FROM

this is really bad practice with applications, exactly because - if You add new column, some of Your code could be broken. For example:

INSERT into table2 SELECT * FROM table1

If You can fix this (or You not have problems code), You can:

  • convert IncorporationDate to Date format and use this column
  • or easy add 1 more column with proper Date type and use it in queries

But first - You must be sure, all other code will not be broken by new column.

Then You can use one of the tools from answer for Your other question:

now much more similar tools present on market, many of them free in parts cover 99% of life cases. And ETL it more often solution rather than work with build functions only.

  • Clean
  • Transform
  • Load

For answer about next potential question - what with speed? we load tables from 4M to 40M between instances - MySQL, MS SQL, Redshift For "small" we use direct insert from ETL tools, for Big - prepare clean file for load, than direct load (Redshift from S3)

Just add live example - "how it could work"

One of Our scripts - run series of queries with search target records by substring from 2 columns + other parameters. Changing the structure also not possible by the same reasons - a lot of place where need to make changes and no free developers resources. 2 table 3M rows per month.

  • Original time - 7.5 Hours
  • Modified process - prepare with Talend tables for past month with all optimised structure.
  • modification only this script - for exclude substrings from SELECT query, do not touch any other in system. SELECT from prepared tables, write result to same as in original.

Prepare - 5m, script execution - 12m, total time of whole process - 17minutes.