Mysql – Anyway to optimize complex view

MySQLview

In the system,
Files can belong to either products, clearances, business_lines or titles.
Clearances are records that represent a combination of a title and a product.
Products are collections of titles (through clearances) and have a business_line_id

The file_associations view is created to retrieve the correct files for a given business_line, title, product, or clearance so it provides a line for each possible combination of file_id with clearance_id, title_id, product_id and business_line_id.

The problem is that when we add a large amount of data to the database, any query that involves the file_associations view takes 180 seconds to execute. I don't really have any idea how to make it more efficient

    CREATE VIEW `file_associations` AS 
   select 'clearance_file' AS `association`,
   `clearance_file`.`file_id` AS `file_id`,
   `clearances`.`id` AS `clearance_id`,`titles`.`id` AS `title_id`,
   `products`.`id` AS `product_id`,
   `products`.`business_line_id` AS `business_line_id`,
   `clearance_file`.`deleted_at` AS `deleted_at` 
   from (
       ((`clearance_file`
       join `clearances` on((`clearances`.`id` = `clearance_file`.`clearance_id`))) 
       join `titles` on((`titles`.`id` = `clearances`.`title_id`))) 
       join `products` on((`products`.`id` = `clearances`.`product_id`))
   ) 
   union select 'file_title' AS `association`,
   `file_title`.`file_id` AS `file_id`,
   `clearances`.`id` AS `clearance_id`,
   `titles`.`id` AS `title_id`,
   `products`.`id` AS `product_id`,
   `products`.`business_line_id` AS `business_line_id`,
   `file_title`.`deleted_at` AS `deleted_at` 
   from ((((`file_title` 
       join `titles` on((`titles`.`id` = `file_title`.`title_id`))) 
       left join `business_line_file` on((`business_line_file`.`file_id` = `file_title`.`id`))) 
       left join `clearances` on((`clearances`.`title_id` = `titles`.`id`))) 
       left join `products` on(((`products`.`id` = `clearances`.`product_id`) and (`products`.`business_line_id` = `business_line_file`.`business_line_id`)))) 
   union select 'file_product' AS `association`,
   `file_product`.`file_id` AS `file_id`,
   `clearances`.`id` AS `clearance_id`,
   `titles`.`id` AS `title_id`,
   `products`.`id` AS `product_id`,
   `products`.`business_line_id` AS `business_line_id`,
   `file_product`.`deleted_at` AS `deleted_at` 
   from (((`file_product` 
       join `products` on((`products`.`id` = `file_product`.`product_id`))) 
       join `clearances` on((`clearances`.`product_id` = `products`.`id`))) 
       join `titles` on((`titles`.`id` = `clearances`.`title_id`)))

I ran an 'explain' on the query rewritten as a select (but not with the huge amount of data):

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: clearance_file
         type: ALL
possible_keys: clearance_file_clearance_id_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 176
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: clearances
         type: eq_ref
possible_keys: PRIMARY,clearances_title_id_index,clearances_product_id_index
          key: PRIMARY
      key_len: 4
          ref: clearance_file.clearance_id
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: titles
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: clearances.title_id
         rows: 1
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: PRIMARY
        table: products
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: clearances.product_id
         rows: 1
        Extra:
*************************** 5. row ***************************
           id: 2
  select_type: UNION
        table: titles
         type: index
possible_keys: PRIMARY
          key: titles_title_type_id_index
      key_len: 4
          ref: NULL
         rows: 130136
        Extra: Using index
*************************** 6. row ***************************
           id: 2
  select_type: UNION
        table: file_title
         type: ref
possible_keys: file_title_title_id_index
          key: file_title_title_id_index
      key_len: 4
          ref: titles.id
         rows: 1
        Extra:
*************************** 7. row ***************************
           id: 2
  select_type: UNION
        table: business_line_file
         type: ref
possible_keys: business_line_file_file_id_index
          key: business_line_file_file_id_index
      key_len: 4
          ref: file_title.id
         rows: 2
        Extra:
*************************** 8. row ***************************
           id: 2
  select_type: UNION
        table: clearances
         type: ref
possible_keys: clearances_title_id_index
          key: clearances_title_id_index
      key_len: 4
          ref: titles.id
         rows: 18
        Extra:
*************************** 9. row ***************************
           id: 2
  select_type: UNION
        table: products
         type: eq_ref
possible_keys: PRIMARY,products_business_line_id_foreign
          key: PRIMARY
      key_len: 4
          ref: clearances.product_id
         rows: 1
        Extra:
*************************** 10. row ***************************
           id: 3
  select_type: UNION
        table: file_product
         type: ALL
possible_keys: file_product_product_id_index
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 146
        Extra:
*************************** 11. row ***************************
           id: 3
  select_type: UNION
        table: products
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: file_product.product_id
         rows: 1
        Extra:
*************************** 12. row ***************************
           id: 3
  select_type: UNION
        table: clearances
         type: ref
possible_keys: clearances_title_id_index,clearances_product_id_index
          key: clearances_product_id_index
      key_len: 4
          ref: products.id
         rows: 524
        Extra: Using where
*************************** 13. row ***************************
           id: 3
  select_type: UNION
        table: titles
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: clearances.title_id
         rows: 1
        Extra: Using index
*************************** 14. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2,3>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:

SHOW CREATE TABLE file_product\G :

*************************** 1. row ***************************
       Table: file_product
Create Table: CREATE TABLE `file_product` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(10) unsigned NOT NULL,
  `file_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `file_product_product_id_index` (`product_id`),
  KEY `file_product_file_id_index` (`file_id`),
  CONSTRAINT `file_product_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES `                                          files` (`id`),
  CONSTRAINT `file_product_product_id_foreign` FOREIGN KEY (`product_id`) REFERE                                          NCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=149 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

SHOW CREATE TABLE file_title\G :

*************************** 1. row ***************************
       Table: file_title
Create Table: CREATE TABLE `file_title` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title_id` int(10) unsigned NOT NULL,
  `file_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `file_title_title_id_index` (`title_id`),
  KEY `file_title_file_id_index` (`file_id`),
  CONSTRAINT `file_title_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES `files` (`id`),
  CONSTRAINT `file_title_title_id_foreign` FOREIGN KEY (`title_id`) REFERENCES `titles` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=363120 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
TESTING#

SHOW CREATE TABLE clearance_file\G :

*************************** 1. row ***************************
       Table: clearance_file
Create Table: CREATE TABLE `clearance_file` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `clearance_id` int(10) unsigned NOT NULL,
  `file_id` int(10) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `clearance_file_clearance_id_index` (`clearance_id`),
  KEY `clearance_file_file_id_index` (`file_id`),
  CONSTRAINT `clearance_file_clearance_id_foreign` FOREIGN KEY (`clearance_id`) REFERENCES `clearances` (`id`),
  CONSTRAINT `clearance_file_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES `files` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=177 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Best Answer

While you might be able to improve the performance somewhat, you're never going to get respectable performance out of this view, because of the UNION. MySQL has 2 view processing algorithms, called MERGE and TEMPTABLE.

A view that's processed with MERGE takes the view definition and merges it with the statement that invokes the view, which allows indexes on the underlying table to be used to satisfy the ... FROM view_name WHERE ... part of the outer query.

The TEMPTABLE algorithm materializes the entire record-set that the view generates it, and then filters out the rows that don't match the outer WHERE... and that's what's happening to you, here, because UNION (and UNION ALL) are incompatible with the MERGE algorithm and this view will always use TEMPTABLE.

I see four alternative options:

  • Scratch this view and just put the raw queries in the application in your application, nothing that each individual query needs its own WHERE clause for the appropriate row matching

  • Create a view for each of the "file" tables, and then write the union in your query:

    SELECT ... FROM clearance_file JOIN ... WHERE ... UNION ALL ...
    SELECT ... FROM file_title JOIN ... WHERE ... UNION ALL ...
  • Create a stored procedure that encapsulates the separate queries (possibly multiple different queries, chosen depending on input parameters)... and accepts parameters that provide the procedure's code with enough information to properly query the data that you're looking for, and return that to the caller with an unbounded select (as I discussed in an answer to a different question).

  • Consider refactoring your schema, if possible, since -- arguably -- you have an architectural problem... every table should contain only "one kind of thing" (that's generally obvious) and there should only be one table for any given kind of thing (that's not always as obvious)... and the "thing" here is "a file", with attributes that tell you what kind of file it is. Since not all attributes would be valid for all files, you'd need additional logic (presumably enforced by triggers) to require that the datafill be consistent with the specific type of file... but it certainly seems to me that this design might be better if structured that way, and would lend itself to more straightforward joins. I would suggest that having separate tables for different subtypes of files is a design flaw that is analogous to having separate tables for "full-time employees" and "part-time employees." True, not all attributes are valid for both types, but a database that stored them in separate tables would have far more potential problems -- later, if not sooner.