Mysql – Are relations slower than a big, inefficient table

database-recommendationjoin;MySQLrdbms

I've been asked in my job to violate the first normal form (repeating groups across columns, using empty / null values) several times, "for the sake of computer processing power". In a nutshell, a "student" table should have at least 8 empty fields (e.g telephones: telephone1, telephone2, telephone3…) instead of my suggestion – a "telephone" table which holds a telephone number (and possible other metadata) and the foreign key is the student id number. My boss says that it's better to store them that way because "there are less CPU cycles and that matters in web platforms", instead of using relations. I say that, in the worst of cases, it's negligible.

In that example, using relations (suppose that the tables are filled with lots of records in a medium-sized webapp) is notably slower than using that kind of table schema?

Best Answer

I don't see how anyone could make such a statement without having some actual facts to back it up. If your queries are CPU bound, then you should look to find ways to reduce that bottleneck.

It sounds as if your boss feels that a denormalized database will perform best, but I don't know enough about your application to say if that is right or not. What will be the expected number of deletes, updates, and inserts for this table?

I would expect that such a denormalized design may result in a reduced amount of CPU time but would expect that your disk I/O would increase. And physical reads from disk will be much more expensive than a CPU cycle, so perhaps your boss has a very specific metric to meet (CPU) and as a result wants a very specific design? If so, I would simply build what is asked for and keep metrics on CPU cost for the queries being run. If you see an increase in time then you may want to suggest some design changes.

In fact, it is probably a good idea to get a list of all the metrics your boss wants to see, and track those over time.

Related Solutions

Postgresql subquery speed much slower than individual queries

Seems you are running in a weakness of the query planner: The best index is sometimes not used for joining tables. Had a similar problem here:
Algorithm for finding the longest prefix (Chapter "Failed attempt with text_pattern_ops")

In Postgres 9.3 You could try this version with LEFT JOIN LATERAL:

SELECT *
FROM  (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    ) pc
LEFT JOIN LATERAL (
    SELECT *
    FROM   public.timezones tz
    WHERE  ST_Intersects(pc.coord, tz.geom)
   ) tz ON TRUE;

Something similar Worked for @ypercube's solution in this related answer.
LATERAL requires Postgres 9.3+, though.

In PostgreSQL 9.1, it might help to encapsulate the first query in a CTE, but I doubt it. (Don't have a PostGis installation here to test.):

WITH pc AS (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    )
SELECT *
FROM   pc
JOIN   public.timezones tz ON ST_Intersects(pc.coord, tz.geom);

A plpgsql function to encapsulate two separate queries should certainly do the trick:

CREATE OR REPLACE FUNCTION f_get_tz(_pc text)
  RETURNS SETOF public.timezones AS
$func$
DECLARE
   _coord geom;
BEGIN

SELECT coord
INTO  _coord
FROM   taduler.postal_code
WHERE  postal_code = _pc
LIMIT  1;

RETURN QUERY
SELECT *
FROM   public.timezones tz
WHERE  ST_Intersects(_coord, tz.geom);

END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_get_tz('T1K0T4');

Mysql – Why are simple SELECTs on InnoDB 100x slower than on MyISAM

YOUR QUERY

SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;

At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.

REALITY CHECK

No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.

InnoDB Architecture

CLUSTERED INDEX

Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.

REDUNDANT INDEXES

You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:

ALTER TABLE newbb_innopost
    DROP INDEX threadid,
    DROP INDEX threadid_2,
    DROP INDEX threadid_visible_dateline,
    ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;

Why strip down these indexes ?

The first three indexes start with threadid
threadid_2 and threadid_visible_dateline start with the same three columns
threadid_visible_dateline does not need postid since it's the PRIMARY KEY and it's embedded

BUFFER CACHING

The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.

Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

TABLE LAYOUT

You could shave of some space from the row by considering importthreadid and importpostid. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.

You should run this

SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();

This will recommend what data types these columns should be for the given dataset.

CONCLUSION

MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.

While you revealed the amount of RAM (32GB) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002), there are still other pieces to this puzzle you have not revealed

The InnoDB settings
The Number of Cores
Other settings from my.cnf

If you can add these things to the question, I can further elaborate.

UPDATE 2014-08-28 11:27 EDT

You should increase threading

innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M

I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)

query_cache_size = 0

I would preserve the Buffer Pool

innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1

Increase purge threads (if you do DML on multiple tables)

innodb_purge_threads = 4