Mysql – Filesort while using primary key for orderby

MySQLoptimizationorder-by

I have been breaking my head for the past two days now for this.

CREATE TABLE `clients` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `guarantor_name` varchar(50) NOT NULL,
    `guarantor_dob` date DEFAULT NULL,
    `guarantor_relationship` int(11) DEFAULT '14',
    `telephone_number` varchar(50) DEFAULT NULL,
    `telephone_type` int(11) DEFAULT '7',
    `state` varchar(50) DEFAULT '',
    `pincode` mediumint(6) unsigned NOT NULL,
    `income` int(11) DEFAULT NULL,
    `family_income` int(11) DEFAULT NULL,
    `name` varchar(100) NOT NULL,
    `gender` int(11) DEFAULT '2',
    `marital_status` int(11) DEFAULT '1',
    `reference` varchar(100) DEFAULT NULL,
    `reference_type` int(11) DEFAULT '4',
    `reference2` varchar(50) DEFAULT NULL,
    `reference2_type` int(11) DEFAULT '2',
    `spouse_name` varchar(100) DEFAULT NULL,
    `date_of_birth` date DEFAULT NULL,
    `spouse_date_of_birth` date DEFAULT NULL,
    `address` text NOT NULL,
    `active` tinyint(1) NOT NULL DEFAULT '1',
    `inactive_reason` int(11) DEFAULT '1',
    `date_joined` date DEFAULT NULL,
    `grt_pass_date` date DEFAULT NULL,
    `center_id` int(11) DEFAULT NULL,
    `created_at` datetime DEFAULT '2013-01-06 14:29:16',
    `deleted_at` datetime DEFAULT NULL,
    `updated_at` datetime DEFAULT NULL,
    `deceased_on` date DEFAULT NULL,
    `created_by_user_id` int(11) NOT NULL,
    `other_income` int(11) DEFAULT NULL,
    `total_income` int(11) DEFAULT NULL,
    `poverty_status` varchar(10) DEFAULT NULL,
    `caste` int(11) DEFAULT '1',
    `religion` int(11) DEFAULT '1',
    `created_by_staff_member_id` int(11) NOT NULL,
    `claim_document_status` int(11) DEFAULT '1',
    `claim_document_recieved_by` int(11) DEFAULT NULL,
    `cliam_document_recieved_on` date DEFAULT NULL,
    `town_classification` int(11) DEFAULT '1',
    `upload_reference` int(11) DEFAULT NULL,
    `picture_file_name` varchar(50) DEFAULT NULL,
    `picture_content_type` varchar(50) DEFAULT NULL,
    `picture_file_size` int(11) DEFAULT NULL,
    `picture_updated_at` datetime DEFAULT NULL,
    `application_form_file_name` varchar(50) DEFAULT NULL,
    `application_form_content_type` varchar(50) DEFAULT NULL,
    `application_form_file_size` int(11) DEFAULT NULL,
    `application_form_updated_at` datetime DEFAULT NULL,
    `fingerprint_file_name` varchar(50) DEFAULT NULL,
    `fingerprint_content_type` varchar(50) DEFAULT NULL,
    `fingerprint_file_size` int(11) DEFAULT NULL,
    `fingerprint_updated_at` datetime DEFAULT NULL,
    `client_group_id` int(10) unsigned DEFAULT NULL,
    `priority_sector_list_id` int(10) unsigned DEFAULT NULL,
    `upload_id` int(10) unsigned DEFAULT NULL,
    `psl_sub_category_id` int(10) unsigned DEFAULT NULL,
    `occupation_id` int(10) unsigned DEFAULT NULL,
    `client_identifier` varchar(50) NOT NULL,
    `age` int(11) DEFAULT NULL,
    `age_as_on_date` date DEFAULT NULL,
    `is_loan_applicant` tinyint(1) DEFAULT '0',
    `occupation_field` varchar(50) DEFAULT NULL,
    `father_name` varchar(100) DEFAULT NULL,
    `father_dob` date DEFAULT NULL,
    `ews` int(11) DEFAULT '1',
    `residence_stability` int(11) DEFAULT '1',
    `owner_status` int(11) DEFAULT '1',
    `no_of_children` varchar(50) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `index_clients_client_group` (`client_group_id`),
    KEY `index_clients_psl_sub_category` (`psl_sub_category_id`),
    KEY `index_dedupe_variables`
    (`deleted_at`,`state`,`reference_type`,`reference`,`id`,`client_identifier`)
) ENGINE=InnoDB AUTO_INCREMENT=234631 DEFAULT CHARSET=utf8;

This is the table. Well its a lengthy table. Now i have the query

SELECT SQL_NO_CACHE `id`, `client_identifier`, `guarantor_name`, `guarantor_dob`,
`guarantor_relationship`, `telephone_number`, `telephone_type`, `state`, `pincode`,
`income`, `family_income`, `name`, `gender`, `marital_status`, `reference`,
`reference_type`, `reference2`, `reference2_type`, `father_name`, `father_dob`,
`ews`, `residence_stability`, `owner_status`, `no_of_children`, `spouse_name`, `age`,
`age_as_on_date`, `date_of_birth`, `spouse_date_of_birth`, `active`, `inactive_reason`,
`date_joined`, `grt_pass_date`, `center_id`, `created_at`, `updated_at`, `deceased_on`,
`created_by_user_id`, `other_income`, `total_income`, `poverty_status`, `caste`,
`religion`, `created_by_staff_member_id`, `claim_document_status`,
`claim_document_recieved_by`, `cliam_document_recieved_on`, `town_classification`,
`is_loan_applicant`, `occupation_field`, `upload_reference`, `picture_file_name`,
`picture_content_type`, `picture_file_size`, `picture_updated_at`,
`application_form_file_name`, `application_form_content_type`,
`application_form_file_size`, `application_form_updated_at`,
`fingerprint_file_name`, `fingerprint_content_type`, `fingerprint_file_size`,
`fingerprint_updated_at`, `priority_sector_list_id`, `occupation_id`,
`client_group_id`, `upload_id`, `psl_sub_category_id`
FROM `clients` use index(primary,index_dedupe_variables)
WHERE `deleted_at` IS NULL AND state = 'Maharashtra'
and reference_type = 4 and reference LIKE '%8858' ORDER BY id;

I have created index on

(deleted_at,state,reference_type,reference,id) 

When I explain this query I get

************* 1. row *************
d
id: 1
select_type: SIMPLE
table: clients
type: ref
possible_keys: index_dedupe_variables
key: index_dedupe_variables
key_len: 167
ref: const,const,const
rows: 119122
Extra: Using where; Using filesort
1 row in set (0.00 sec)

I tried increasing sort buffer size as well as max_heap_table_size but I am not able to get rid of filesort but of no use. I even tried to force index which too didn't work. I wanted to use covering index but given the number of columns it might sound awful .I am not sure where am i going wrong 🙁 .

Best Answer

KEY `index_dedupe_variables`
(`deleted_at`,`state`,`reference_type`,`reference`,`id`,`client_identifier`)

Try removing that index and replacing it with this one:

KEY `new_index`
(`deleted_at`,`state`,`reference_type`,`id`)

Even when an index is being used, only the useful portion of the index is used; columns to the right of the first instance where the index cannot be used are not considered in that index.

In your query, 'reference' uses a wildcard on the left side, making that column of the index not useful, and eliminating the use of `id` in the index.

Your explain plan tells you this is happening:

ref: const,const,const

From this, you can tell that only the first three columns of the index are actually being used by the query.

By redefining the index to eliminate the unusable "reference" column, we can use all 4 of the columns of the index, and since the index is already sorted in the order you want, we can then do this:

WHERE `deleted_at` IS NULL
  AND state = 'Maharashtra'
  AND reference_type = 4 
  AND reference LIKE '%8858' 
ORDER BY deleted_at, state, reference_type, id

We don't really want them ordered by those first three columns, but since all they are all equality comparisons, ordering by them won't change the results and it should be "obvious" to the optimizer that it can simply return the results in index order... no filesort needed.

Your EXPLAIN should change to "Using Where" and possibly "Using index" (since this can mean "using index for lookups" even when it's not a covering index.)

It's possible that the optimizer might catch on and not use a filesort even if your ORDER BY were only `id`\ so you might try it both ways after you change the index.

Also, remove the use index from your query. It shouldn't be needed.