Mysql – Need help improving sql query performance

MySQLperformancequery-performance

I have this table:

 CREATE TABLE `property_ads_history` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `property_ad_id` int(10) unsigned NOT NULL,
 `advertiser_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `agency_reference_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `average_sale_price` double(8,2) NOT NULL DEFAULT '-1.00',
 `bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `bedrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `carports` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_country` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_reason` enum('Geocoded','Sanitized Parking','Sanitized Representation','Sanitized Address','Scraped','URL Inserted','QA Sanitized Address','QA Sanitized Representation','QA Sanitized Parkings') COLLATE utf8_unicode_ci DEFAULT NULL,
 `description` longtext COLLATE utf8_unicode_ci NOT NULL,
 `ensuite_bathrooms` double(8,2) NOT NULL DEFAULT '-1.00',
 `DELETE_ad_expired_at` datetime NOT NULL,
 `floor_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `formatted_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `garages` double(8,2) NOT NULL DEFAULT '-1.00',
 `geocode_status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `is_represented` tinyint(1) DEFAULT NULL,
 `land_area` double(8,2) NOT NULL DEFAULT '-1.00',
 `latitude` double(10,6) NOT NULL,
 `location_id` int(10) unsigned DEFAULT NULL,
 `longitude` double(10,6) NOT NULL,
 `off_street_parkings` double(8,2) NOT NULL DEFAULT '-1.00',
 `official_property_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `page_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `parking` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `posted_at` datetime NOT NULL,
 `posted_at_string` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `postal_code` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `price` double(10,2) NOT NULL DEFAULT '-1.00',
 `primary_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `DELETE_property_ad_created_at` datetime NOT NULL,
 `property_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `rateable_value` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_1` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_2` double(10,2) NOT NULL DEFAULT '-1.00',
 `recent_sale_3` double(10,2) NOT NULL DEFAULT '-1.00',
 `reviewer_comments` longtext COLLATE utf8_unicode_ci NOT NULL,
 `route` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `source_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
 `street_number` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `user_id` int(10) unsigned DEFAULT NULL,
 `user_provided_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `deleted_at` timestamp NULL DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `unique_index` (`created_at`,`created_reason`,`source_id`),
 KEY `property_ads_history_property_ad_id_foreign` (`property_ad_id`),
 KEY `property_ads_history_location_id_foreign` (`location_id`),
 KEY `created_reason` (`created_reason`)
);

This is my SQL query:

SELECT * FROM `property_ads_history` `t1` 
WHERE `t1`.`created_at` >= '2016-04-13 00:00:00' 
AND `t1`.`created_reason` = 'Scraped' 
AND (`t1`.`price` > -1 OR `t1`.`price` <> 999999.99) 
AND (
  SELECT `t2`.`price` FROM `property_ads_history` `t2` 
  WHERE `t2`.`property_ad_id` = `t1`.`property_ad_id` 
  AND `t2`.`created_at` < `t1`.`created_at` 
  AND (`t2`.`price` > -1 OR `t2`.`price` <> 999999.99) ORDER BY  
  DATE(`t2`.`created_at`) DESC LIMIT 1
) <> `t1`.`price` GROUP BY `t1`.`property_ad_id`;

All I want in that query are:

  1. I want to get the records that meet certain criteria, particularly that the created_at field is within the last 24 hours

  2. I need to get the record that immediately precedes the records in #1

  3. further filter results in #1 to records whose price column has had a different value in a record's history, other than -1

The problem I'm facing is that the query is too slow. I have a million records. It would take more or less 2 mins for the query to execute. I'm guessing GROUP BY makes the query slow. I think I need to make a composite index but I'm not sure how to do it.

Explain result:


id | select_type        | table | type | possible_keys                                             | key                                         | key_len | ref                                | rows   | Extra
1  | PRIMARY            | t1    | ref  | unique_index,created_reason                               | created_reason                              | 2       | const                              | 590030 | Using where; Using temporary; Using filesort
2  | DEPENDENT SUBQUERY | t2    | ref  | unique_index,property_ads_history_property_ad_id_foreign  | property_ads_history_property_ad_id_foreign | 4       | reis_real_estate.t1.property_ad_id | 7      | Using where; Using filesort

Best Answer

ORDER BY in the sub query is what's making it slow.

ORDER BY makes query slow since you need to order (sort) all the objects in the query results. It is in order of N Square if you don't have an index, and N log N if you do.

And you don't even need to order by all of it.

What you need is only to filter out the minimum, so instead of:

ORDER BY  
  DATE(`t2`.`created_at`) DESC LIMIT 1

I would add another condition in the WHERE clause saying:

WHERE DATE(t2.'created at') <> (SELECT MAX (DATE ('created at') FROM 't2'))

Since it needs to look at all the dates in the query, but not necessarily to order them.

Google "Create Composite Index". MySQL is not my variant of SQL. It is always good practice to check subjects like indexes and Primary Keys outside of Q&A forums (like this one) since they are very broad and you need to read at least 5 articles in order to decide which composite index you need, on top of how to write it (but since you query that field quite often in that query, I'd suggest you should create one).