Mysql – How to optimize UPDATE with a nested SELECT subquery

join;MySQLoptimizationsubqueryupdate

I wrote a complicated UPDATE query, and it works, but it looks menacing. Here's what I'm trying to do:

In each topic user 'Bob123' posted anonymously. When you post anonymously in a topic, you get a unique anonymous index for that topic.

Say I want to merge two topics together. Bob123 has a different anon index in both topics, so his unique anon index wouldn't be unique. I only have two pieces of data to work with: $topic_id, the topic id you are merging into, and $post_id_list, all the post ids that got merged over.

I want to update all anonymous_index entries per each distinct poster_id's post in that topic. This anonymous_index needs to be the original index they had in the topic before the other topic was merged into it.

The first SELECT query first selects the anon indices of the moved posts.
The outer SELECT query gets the first non-merged post's anon index (if it is > 0) of those merged posters in the topic and selects a merged anon index from the first query.

Then, I update it. Wherever the anon index of those posters in that topic doesn't equal the old index, I update it.

Is there something simple that I'm missing here? I don't like the fact that I have a subquery in a subquery.

At first I was using HAVING MIN(anonymous_index) <> MAX(anonymous_index) along with AND post_id NOT IN ($merged_post_list)to select the poster id list that needed to be updated and an unmerged anon index, but it returned 0 rows with this. If the merged post is BEFORE all original posts (and has a larger anon index), then the minimum anon index will match the maximum index for that poster. So making another subquery fixed this…

$merged_post_list = implode(',', $post_id_list);

...

UPDATE " . POSTS_TABLE . " AS p
INNER JOIN (    SELECT p.post_id, p.anonymous_index AS old_index,
                       merged.poster_id, merged.anonymous_index AS new_index
                FROM " . POSTS_TABLE . " AS p,
                (       SELECT poster_id, anonymous_index
                        FROM " . POSTS_TABLE . "
                        WHERE post_id IN ($merged_post_list)
                        AND topic_id = $topic_id
                        AND anonymous_index > 0
                ) AS merged
                WHERE p.post_id NOT IN ($merged_post_list)
                AND p.topic_id = $topic_id
                AND p.anonymous_index > 0
                AND p.poster_id = merged.poster_id
                GROUP BY merged.poster_id
) AS postdata
SET p.anonymous_index = postdata.old_index
WHERE p.topic_id = $topic_id
AND anonymous_index > 0
AND anonymous_index <> postdata.old_index
AND p.poster_id = postdata.poster_id

post_id is the primary index, poster_id and topic_id are also indices.

Here's some sample behavior:

Before merge:

|post_id_____poster_id_____anonymous_index|
| 11         | 3           | 2            |
| 12         | 22          | 1            |
| 14         | 22          | 1            |
| 15         | 3           | 2            |

After merge:

|post_id_____poster_id_____anonymous_index|
| 10         | 22          | 4            |
| 11         | 3           | 2            |
| 12         | 22          | 1            |
| 13         | 3           | 4            |
| 14         | 22          | 1            |
| 15         | 3           | 2            |
| 16         | 22          | 4            |

After UPDATE (the above query):

|post_id_____poster_id_____anonymous_index|
| 10         | 22          | 1            |
| 11         | 3           | 2            |
| 12         | 22          | 1            |
| 13         | 3           | 2            |
| 14         | 22          | 1            |
| 15         | 3           | 2            |
| 16         | 22          | 1            |

EDIT: I've made the following index and an alternative SELECT query to avoid having two subqueries, how would these fare?:
(topic_id, poster_id, anonymous_index, post_id)

SELECT merged.poster_id, merged.anonymous_index AS new_index,
       old.post_id, old.anonymous_index AS old_index
FROM " . POSTS_TABLE . " AS merged,
     " . POSTS_TABLE . " AS old
WHERE merged.post_id IN ($post_list)
AND merged.anonymous_index > 0
AND merged.anonymous_index <> old.anonymous_index
AND old.topic_id = $topic_id
AND old.post_id NOT IN ($post_list)
AND old.anonymous_index > 0
AND old.poster_id = merged.poster_id
GROUP BY merged.poster_id
ORDER BY NULL

EDIT AGIAN:

Here is my schema:

Table structure for table `phpbb_posts`
--

CREATE TABLE `phpbb_posts` (
  `post_id` int(10) UNSIGNED NOT NULL,
  `topic_id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `forum_id` mediumint(8) UNSIGNED NOT NULL DEFAULT '0',
  `poster_id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `icon_id` mediumint(8) UNSIGNED NOT NULL DEFAULT '0',
  `poster_ip` varchar(40) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_time` int(11) UNSIGNED NOT NULL DEFAULT '0',
  `post_reported` tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
  `enable_bbcode` tinyint(1) UNSIGNED NOT NULL DEFAULT '1',
  `enable_smilies` tinyint(1) UNSIGNED NOT NULL DEFAULT '1',
  `enable_magic_url` tinyint(1) UNSIGNED NOT NULL DEFAULT '1',
  `enable_sig` tinyint(1) UNSIGNED NOT NULL DEFAULT '1',
  `post_username` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_subject` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `post_text` mediumtext COLLATE utf8_bin NOT NULL,
  `post_checksum` varchar(32) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_attachment` tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
  `bbcode_bitfield` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `bbcode_uid` varchar(8) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_postcount` tinyint(1) UNSIGNED NOT NULL DEFAULT '1',
  `post_edit_time` int(11) UNSIGNED NOT NULL DEFAULT '0',
  `post_edit_reason` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_edit_user` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `post_edit_count` smallint(4) UNSIGNED NOT NULL DEFAULT '0',
  `post_edit_locked` tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
  `post_visibility` tinyint(3) NOT NULL DEFAULT '0',
  `post_delete_time` int(11) UNSIGNED NOT NULL DEFAULT '0',
  `post_delete_reason` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
  `post_delete_user` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `sfs_reported` tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
  `parent_id` int(10) UNSIGNED DEFAULT '0',
  `post_depth` int(3) UNSIGNED NOT NULL DEFAULT '0',
  `is_anonymous` tinyint(1) UNSIGNED NOT NULL DEFAULT '0',
  `anonymous_index` mediumint(8) UNSIGNED NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

--
-- Indexes for dumped tables
--

--
-- Indexes for table `phpbb_posts`
--
ALTER TABLE `phpbb_posts`
  ADD PRIMARY KEY (`post_id`),
  ADD KEY `forum_id` (`forum_id`),
  ADD KEY `topic_id` (`topic_id`),
  ADD KEY `poster_ip` (`poster_ip`),
  ADD KEY `poster_id` (`poster_id`),
  ADD KEY `post_username` (`post_username`),
  ADD KEY `tid_post_time` (`topic_id`,`post_time`),
  ADD KEY `post_visibility` (`post_visibility`),
  ADD KEY `parent_id` (`parent_id`);

--
-- AUTO_INCREMENT for dumped tables
--

--
-- AUTO_INCREMENT for table `phpbb_posts`
--
ALTER TABLE `phpbb_posts`
  MODIFY `post_id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;COMMIT;

Best Answer

                    WHERE post_id IN ($merged_post_list)
                    AND topic_id = $topic_id
                    AND anonymous_index > 0

would probably benefit from this composite index:

INDEX(topic_id, post_id, anonymous_index)  -- in this order

Then, assuming the Optimizer starts with the derived table, this

            WHERE p.post_id NOT IN ($merged_post_list)
            AND p.topic_id = $topic_id
            AND p.anonymous_index > 0
            AND p.poster_id = merged.poster_id

might benefit from this 4-column composite index:

INDEX(poster_id, topic_id,       -- in either order
      post_id, anonymous_index)  -- then these, in this order

(Please do not reuse an alias (p), it makes the query hard to read.)

Finally, this

WHERE p.topic_id = $topic_id
AND anonymous_index > 0
AND anonymous_index <> postdata.old_index
AND p.poster_id = postdata.poster_id

needs my 3-column index.

Latest version; schema

It is past time to switch to InnoDB! (Meanwhile, I will have to think about whether my advice needs to be adapted to MyISAM.)

In newer versions of MySQL, the GROUP BY will be invalid. This is because the other 3 columns in the SELECT are ambiguous -- for example which post_id should it deliver?

ALTER TABLE phpbb_posts
    DROP INDEX topic_id,
    DROP INDEX poster_id,
    ADD INDEX(topic_id, anonymous_index, poster_id, post_id),
    ADD INDEX(poster_id, topic_id, anonymous_index, post_id) ;

Those should work for either MyISAM or InnoDB, and be "covering".

For (only) MyISAM, this may be beneficial:

    INDEX(post_id, topic_id, anonymous_index, poster_id)