In the system,
Files can belong to either products, clearances, business_lines or titles.
Clearances are records that represent a combination of a title and a product.
Products are collections of titles (through clearances) and have a business_line_id
The file_associations view is created to retrieve the correct files for a given business_line, title, product, or clearance so it provides a line for each possible combination of file_id with clearance_id, title_id, product_id and business_line_id.
The problem is that when we add a large amount of data to the database, any query that involves the file_associations
view takes 180 seconds to execute. I don't really have any idea how to make it more efficient
CREATE VIEW `file_associations` AS
select 'clearance_file' AS `association`,
`clearance_file`.`file_id` AS `file_id`,
`clearances`.`id` AS `clearance_id`,`titles`.`id` AS `title_id`,
`products`.`id` AS `product_id`,
`products`.`business_line_id` AS `business_line_id`,
`clearance_file`.`deleted_at` AS `deleted_at`
from (
((`clearance_file`
join `clearances` on((`clearances`.`id` = `clearance_file`.`clearance_id`)))
join `titles` on((`titles`.`id` = `clearances`.`title_id`)))
join `products` on((`products`.`id` = `clearances`.`product_id`))
)
union select 'file_title' AS `association`,
`file_title`.`file_id` AS `file_id`,
`clearances`.`id` AS `clearance_id`,
`titles`.`id` AS `title_id`,
`products`.`id` AS `product_id`,
`products`.`business_line_id` AS `business_line_id`,
`file_title`.`deleted_at` AS `deleted_at`
from ((((`file_title`
join `titles` on((`titles`.`id` = `file_title`.`title_id`)))
left join `business_line_file` on((`business_line_file`.`file_id` = `file_title`.`id`)))
left join `clearances` on((`clearances`.`title_id` = `titles`.`id`)))
left join `products` on(((`products`.`id` = `clearances`.`product_id`) and (`products`.`business_line_id` = `business_line_file`.`business_line_id`))))
union select 'file_product' AS `association`,
`file_product`.`file_id` AS `file_id`,
`clearances`.`id` AS `clearance_id`,
`titles`.`id` AS `title_id`,
`products`.`id` AS `product_id`,
`products`.`business_line_id` AS `business_line_id`,
`file_product`.`deleted_at` AS `deleted_at`
from (((`file_product`
join `products` on((`products`.`id` = `file_product`.`product_id`)))
join `clearances` on((`clearances`.`product_id` = `products`.`id`)))
join `titles` on((`titles`.`id` = `clearances`.`title_id`)))
I ran an 'explain' on the query rewritten as a select (but not with the huge amount of data):
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: clearance_file
type: ALL
possible_keys: clearance_file_clearance_id_index
key: NULL
key_len: NULL
ref: NULL
rows: 176
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: clearances
type: eq_ref
possible_keys: PRIMARY,clearances_title_id_index,clearances_product_id_index
key: PRIMARY
key_len: 4
ref: clearance_file.clearance_id
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: PRIMARY
table: titles
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: clearances.title_id
rows: 1
Extra: Using index
*************************** 4. row ***************************
id: 1
select_type: PRIMARY
table: products
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: clearances.product_id
rows: 1
Extra:
*************************** 5. row ***************************
id: 2
select_type: UNION
table: titles
type: index
possible_keys: PRIMARY
key: titles_title_type_id_index
key_len: 4
ref: NULL
rows: 130136
Extra: Using index
*************************** 6. row ***************************
id: 2
select_type: UNION
table: file_title
type: ref
possible_keys: file_title_title_id_index
key: file_title_title_id_index
key_len: 4
ref: titles.id
rows: 1
Extra:
*************************** 7. row ***************************
id: 2
select_type: UNION
table: business_line_file
type: ref
possible_keys: business_line_file_file_id_index
key: business_line_file_file_id_index
key_len: 4
ref: file_title.id
rows: 2
Extra:
*************************** 8. row ***************************
id: 2
select_type: UNION
table: clearances
type: ref
possible_keys: clearances_title_id_index
key: clearances_title_id_index
key_len: 4
ref: titles.id
rows: 18
Extra:
*************************** 9. row ***************************
id: 2
select_type: UNION
table: products
type: eq_ref
possible_keys: PRIMARY,products_business_line_id_foreign
key: PRIMARY
key_len: 4
ref: clearances.product_id
rows: 1
Extra:
*************************** 10. row ***************************
id: 3
select_type: UNION
table: file_product
type: ALL
possible_keys: file_product_product_id_index
key: NULL
key_len: NULL
ref: NULL
rows: 146
Extra:
*************************** 11. row ***************************
id: 3
select_type: UNION
table: products
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: file_product.product_id
rows: 1
Extra:
*************************** 12. row ***************************
id: 3
select_type: UNION
table: clearances
type: ref
possible_keys: clearances_title_id_index,clearances_product_id_index
key: clearances_product_id_index
key_len: 4
ref: products.id
rows: 524
Extra: Using where
*************************** 13. row ***************************
id: 3
select_type: UNION
table: titles
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: clearances.title_id
rows: 1
Extra: Using index
*************************** 14. row ***************************
id: NULL
select_type: UNION RESULT
table: <union1,2,3>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
SHOW CREATE TABLE file_product\G
:
*************************** 1. row ***************************
Table: file_product
Create Table: CREATE TABLE `file_product` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`product_id` int(10) unsigned NOT NULL,
`file_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `file_product_product_id_index` (`product_id`),
KEY `file_product_file_id_index` (`file_id`),
CONSTRAINT `file_product_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES ` files` (`id`),
CONSTRAINT `file_product_product_id_foreign` FOREIGN KEY (`product_id`) REFERE NCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=149 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
SHOW CREATE TABLE file_title\G
:
*************************** 1. row ***************************
Table: file_title
Create Table: CREATE TABLE `file_title` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title_id` int(10) unsigned NOT NULL,
`file_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `file_title_title_id_index` (`title_id`),
KEY `file_title_file_id_index` (`file_id`),
CONSTRAINT `file_title_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES `files` (`id`),
CONSTRAINT `file_title_title_id_foreign` FOREIGN KEY (`title_id`) REFERENCES `titles` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=363120 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
TESTING#
SHOW CREATE TABLE clearance_file\G
:
*************************** 1. row ***************************
Table: clearance_file
Create Table: CREATE TABLE `clearance_file` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`clearance_id` int(10) unsigned NOT NULL,
`file_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `clearance_file_clearance_id_index` (`clearance_id`),
KEY `clearance_file_file_id_index` (`file_id`),
CONSTRAINT `clearance_file_clearance_id_foreign` FOREIGN KEY (`clearance_id`) REFERENCES `clearances` (`id`),
CONSTRAINT `clearance_file_file_id_foreign` FOREIGN KEY (`file_id`) REFERENCES `files` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=177 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Best Answer
While you might be able to improve the performance somewhat, you're never going to get respectable performance out of this view, because of the
UNION
. MySQL has 2 view processing algorithms, calledMERGE
andTEMPTABLE
.A view that's processed with
MERGE
takes the view definition and merges it with the statement that invokes the view, which allows indexes on the underlying table to be used to satisfy the... FROM view_name WHERE ...
part of the outer query.The
TEMPTABLE
algorithm materializes the entire record-set that the view generates it, and then filters out the rows that don't match the outerWHERE
... and that's what's happening to you, here, becauseUNION
(andUNION ALL
) are incompatible with theMERGE
algorithm and this view will always useTEMPTABLE
.I see four alternative options:
Scratch this view and just put the raw queries in the application in your application, nothing that each individual query needs its own
WHERE
clause for the appropriate row matchingCreate a view for each of the "file" tables, and then write the union in your query:
Create a stored procedure that encapsulates the separate queries (possibly multiple different queries, chosen depending on input parameters)... and accepts parameters that provide the procedure's code with enough information to properly query the data that you're looking for, and return that to the caller with an unbounded select (as I discussed in an answer to a different question).
Consider refactoring your schema, if possible, since -- arguably -- you have an architectural problem... every table should contain only "one kind of thing" (that's generally obvious) and there should only be one table for any given kind of thing (that's not always as obvious)... and the "thing" here is "a file", with attributes that tell you what kind of file it is. Since not all attributes would be valid for all files, you'd need additional logic (presumably enforced by triggers) to require that the datafill be consistent with the specific type of file... but it certainly seems to me that this design might be better if structured that way, and would lend itself to more straightforward joins. I would suggest that having separate tables for different subtypes of files is a design flaw that is analogous to having separate tables for "full-time employees" and "part-time employees." True, not all attributes are valid for both types, but a database that stored them in separate tables would have far more potential problems -- later, if not sooner.