Mysql – Choosing MyISAM over InnoDB for these project requirements; and long term options

database-designinnodbmyisamMySQL

Sorry for the long post, but I had to give as much info as possible to make this very vague question more specific.

My project's aim is to let users search a (huge) database of variety of products.

  • Each product is present under a category.
  • Each product will have 10 to 100 'specs' or 'features' by which the users will search.

The most common usecase is:

  1. User clicks on a category; then clicks various sub-categories if required.
  2. User starts off with 1 or 2 criteria and searches for products.
  3. User then keeps adding more criteria to the search to narrow down on the product.

I have three main tables 'products', 'features_enum' and 'features'. It is very important to let the data-entry users, create new 'features' on the fly, for the products – hence I am using EAV (anti)pattern.

Here are the structures of the tables:

'products'
    ID(PK), TITLE, CATEGORY
    (Indexed by CATEGORY)
'features_enum'
    ID(PK), TITLE
'features'
    P_ID, F_ID, VAL
    (Indexed by P_ID and then F_ID)

A sample format of my main search query:

SELECT
  p.ID,
  p.TITLE PROD_TITLE,
  fe.TITLE FEATURE_TITLE,
  f.VAL
FROM
  products p, features f, features_enum fe
WHERE
  p.CATEGORY = 57 AND
  p.ID = f.P_ID AND
  f.F_ID = fe.ID AND
  (
    (f.F_ID  = 1 AND f.VAL = 'Val1') AND
    (f.F_ID  = 2 AND f.VAL = 'Val2') AND
    ...
    (f.F_ID  = N AND f.VAL = 'ValN') AND
  )

My Experimentation So Far:

Due to my limited knowledge and experience in DBs, I hit a wall with theoretical planning. So, I generated a large set of test data to simply see what will work. All three tables had 500,000 test rows. Here are the avg. run times of the main search query:

  1. InnoDB without indexing: 90s.
  2. InnoDB with indexing: 15s. 0.3s after buffer pool size increase
  3. MyISAM without indexing: 9s.
  4. MyISAM with indexing: 0.7s.
  5. MyISAM with indexing + FIXED row type: 0.16s.

Test machine – Pentium 4 1.9GHz, 1.5GB RAM, IDE HDD, Win7.

I have basically not done anything to optimize other than indexing. So there might be a ton of things that I missed, that could have made InnoDB run faster. InnoDB buffer pool size was set to 16M (!!); I incremented it to 128M. Now, InnoDB is really fast. So one big reason for me leaning toward MyISAM is now gone. Maybe there is more that I can do.

Some points and long term usage estimates about the project:

  • 20 new products added daily, at roughly 20 x 100'specs' = 2000 record writes per day.

  • 1,000,000 page visits, and in worst case – same amount of search query runs per day.

  • Total record count for the tables is expected to reach 5,000,000 each.

  • Writes will be made by a semi-controlled group of people, where as read is public.

  • There are no complex 'transaction' type writes. The most complex write I can think of right now is – [one product row + 100 feature rows max] – at one shot

  • Require only a couple of constraints, but if necessitated by choice of MyISAM, I can enforce them at the application level itself.

  • DB access from other parts of the app – user registration, authentication etc.. will be few and far in between, I don't think they will have much effect.

Given all of that, I am biased towards MyISAM. But I need input from people already experienced in MySQL.

Questions:

  1. If the InnoDB run times are wrong/surprising, what have I missed in
    testing?
    Increasing buffer pool size dramatically increased performance. See above.
  2. If not, considering all of the above, is MyISAM really good
    choice in the long run?
  3. If MyISAM too turns out to be a bad choice
    later, how easily can I restructure the database? What options do I
    have?

On a side note:

  1. If choosing EAV was bad, what other architecture can I use for this project?

Best Answer

InnoDB and MyISAM each have their strengths and weaknesses.

If you have enough RAM, I would choose InnoDB because it caches data and index pages in the Buffer Pool. MyISAM only caches index pages in the Key Cache.

MyISAM tables experience full table locks for each INSERT, UPDATE, and DELETE. MyISAM tables always require disk access for data.

InnoDB tables always incur disk I/O in the following areas:

  • Double Write Buffer : Changes are posted in ibdata1 to avoid OS caching
  • Insert Buffer : Changes to Secondary (non-Unique) Indexes as posted in ibdata1
  • Data and Indexes
    • With innodb_file_per_table = 0, changes are written to ibdata1
    • With innodb_file_per_table = 1, changes are written to .ibd tablespace file. Read I/O against ibdata1 still necessary to crosscheck table metadata

SUMMARY

In an environment with the following:

  • heavy writes
  • heavy reads
  • tons of RAM
  • heavy connections

I would always choose InnoDB. Please check out my other post about InnoDB over MyISAM : When to switch from MyISAM to InnoDB?

When would I every choose MyISAM?

Under the following scenario

  • Using MySQL Replication
  • Master with all InnoDB
  • Slave with all tables converted to MyISAM
  • ALTER TABLE ... ROW_FORMAT=Fixed for all tables on the Slave

Disk I/O wise, MyISAM has a slight edge with ROW_FORMAT-Fixed because you only interact with one file, the .MYD file. The row size is completely predictable because VARCHAR is treated as CHAR this shortening access time for data retrieval.

On the other hand, InnoDB has to interact with multiple files (ibdata1, serveral read/write threads upon the .ibd of the InnoDB table).