I have the following Query:
explain analyze
SELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path,
(i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_target
WHERE
i.checksum_md5 IS NOT NULL
AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0;
split_part(s.full_path, '/', 4)::INT IN (
SELECT account.id
FROM public.ja_clients AS account
WHERE
NOT (
((account.last_sub_pay > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND (account.price_model > 0)) OR
(account.regdate > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
(((account.price_model = 0) AND (account.jobcredits > 0)) AND (account.last_login > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
) LIMIT 100
);
- Explain analyze link: http://explain.depesz.com/s/Oc6
The query is taking ages, and I can't get the problem solved.
These are the index I've already created on the inode_segments table:
Indexes:
"ix_account_id_from_full_path" "btree" (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE "full_path"::"text" ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
"ix_inode_segments_ja_files_lookup" "btree" ((
CASE
WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN "upper"("regexp_replace"("full_path"::"text", '.*\.'::"text", ''::"text", 'g'::"text"))
ELSE NULL::"text"
END)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids" "btree" (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids2" "btree" ("full_path")
"ix_inode_segments_notes_fileids" "btree" (("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_noteids" "btree" ((NULLIF("split_part"("full_path"::"text", '/'::"text", 6), 'unassigned'::"text")::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")
These are the index I've already created on the inodes table:
Indexes:
"ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size") WHERE "checksum_md5" IS NOT NULL
Question:
What else can I do to improve the Performance of the Query?
- This is related to my previous question; index creation – Slow Query – PostgreSQL 9.2
UPDATE 1:
Explain analyze: http://explain.depesz.com/s/UBr
The index and function have been created as mentioned on the answer below.
UPDATE 2:
Explain analyze: http://explain.depesz.com/s/LHS
Using the Query provided on the answer below
Best Answer
Perhaps this will help.
If you'll rely on the account_id from full_path often, then you'll benefit from a function and a functional index for it:
Ensure
gorfs.inodes
has an index (or key much better if applicable) onst_ino
!You run the function
split_part
several times for each row, this is likely taking a significant toll. I've replaced it with string_to_array, and then fetch the individual pieces as needed. I also didn't understand what you intended to obtain for the fieldfield_name
using reverse? The query below returns the last element for it.Your query returns many million rows. Even if PostgreSQL processes the query reasonably quickly, your client application (especially if you use PgAdminIII) will struggle allocating enough memory and receive and format the results, and probably be what takes the most time. So you may want to create a temporary table with the results, and then query against the temporary table: