Mysql – Large MySQL table that need to index some fields

MySQL

I would like to know some hints about how to index some columns of a MySQL table that has about 28 million rows.

Maybe I could recreate the table indexing it properly, then import these rows?

Could someone help me with this please.

Best Answer

If you just wish to create a single index, CREATE INDEX would be the way to go.

However, if you wish to setup your table with multiple indices, setting up a new empty table and then copying from your old to your new table is IMO a more practical solution, since you can get everything ready and then simply run the insert when it's convenient.

Related Solutions

MySQL Index Questions

You must refactor the query to perform WHERE, JOIN and GROUP BY clauses IN THAT EXACT ORDER !!!

Here is the refactored query :

select
    sq.search_id,
    count(1) search_id_count
from
    (select search_id from search where search_id > 47000) sq
    inner join
    (select search_id from search_hit where search_id > 47000) ssh
    using (search_id)
group by
    sq.search_id;

Running OPTIMIZE TABLE (reduces table fragmentation and recreate indexes) may not be necessary unless you do heavy INSERTs, UPDATEs, DELETEs. I'll say it is optional.

Give it a Try !!!

Thesql: need help to optimize the query/table

Add an index on (tag, hash)

ALTER TABLE tags
  DROP INDEX tag,
  ADD UNIQUE INDEX tag_hash_UX
    (tag, `hash`) ;

Then try this query:

SELECT u.md5, u.url, u.title, u.numsaves
FROM urls AS u
  JOIN tags AS t1 ON  t1.hash = u.md5
                  AND t1.tag = 'php'
  JOIN tags AS t2 ON  t2.hash = u.md5
                  AND t2.tag = 'tutorials'
ORDER BY u.numsaves DESC
  LIMIT 20 ;

and after adding another index:

ALTER TABLE urls
  DROP INDEX md5,
  ADD INDEX md5_numsaves_IX
    (md5, numsaves) ;

try this variation (explanation below, at point 3):

SELECT u.md5, u.url, u.title, u.numsaves
FROM urls AS u
  JOIN 
    ( SELECT ui.md5, ui.numsaves
      FROM urls AS ui
        JOIN tags AS t1 ON  t1.hash = ui.md5
                  AND t1.tag = 'php'
        JOIN tags AS t2 ON  t2.hash = ui.md5
                  AND t2.tag = 'tutorials'
      ORDER BY ui.numsaves DESC
        LIMIT 20
    ) AS ulim ON ulim.md5 = u.md5
ORDER BY ulim.numsaves DESC ;

Other things you should consider:

You have 2 identical indexes on URLS (md5), one Primary and one Unique. You don't need both, you can safely delete the redundant (unique) index.
Why was md5 was chosen as the primary key in the first place? Why not a surrogate (auto-incrementing) integer column? That is usually a better choice for primary key in InnODB tables because the primary key is also by default the clustered index of the table.

If you make that change (add an url_id column and make it PK), you will also need to alter the TAGS table by adding a url_id as well and removing the hash column. This would help efficiency, too. Indexes on int columns take much less space than indexes on VARCHAR(255) columns. 4 bytes vs. 757 bytes wide is a lot of difference.
Read this blog post about text and blob columns in InnoDB: Handling long texts/blobs in InnoDB If the issue turns out to be the text columns, you could split the table into two tables, having an 1-1 relationship between them, so text columns are separately from the narrow columns (or try the index I suggested, which is a similar fix for this.)

Best Answer

Related Solutions

MySQL Index Questions

Thesql: need help to optimize the query/table

Related Question