Mysql – Am I wrong in table design or wrong in selected index when made the table

database-designindex-tuningMySQLmysql-5.5performance

I've build web application as a tool to eliminate unnecessary data in peoples table, this application mainly to filter all data of peoples who valid to get an election rights. At first, it wasn't a problem when the main table still had few rows, but it is really bad (6 seconds) when the table is filled with about 200K rows (really worse because the table will be up to 6 million rows).

I have table design like below, and I am doing a join with 4 tables (region table start from province, city, district and town). Each region table is related to each other with their own id:

CREATE TABLE `peoples` (
                      `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
                      `id_prov` smallint(2) NOT NULL,
                      `id_city` smallint(2) NOT NULL,
                      `id_district` smallint(2) NOT NULL,
                      `id_town` smallint(4) NOT NULL,
                      `tps` smallint(4) NOT NULL,
                      `urut_xls` varchar(20) NOT NULL,
                      `nik` varchar(20) NOT NULL,
                      `name` varchar(60) NOT NULL,
                      `place_of_birth` varchar(60) NOT NULL,
                      `birth_date` varchar(30) NOT NULL,
                      `age` tinyint(3) NOT NULL DEFAULT '0',
                      `sex` varchar(20) NOT NULL,
                      `marital_s` varchar(20) NOT NULL,
                      `address` varchar(160) NOT NULL,
                      `note` varchar(60) NOT NULL,
                      `m_name` tinyint(1) NOT NULL DEFAULT '0',
                      `m_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
                      `format_birthdate` tinyint(1) NOT NULL DEFAULT '0' ,
                      `m_sex` tinyint(1) NOT NULL DEFAULT '0' COMMENT ,
                      `m_m_status` tinyint(1) NOT NULL DEFAULT '0' ,
                      `sex_double` tinyint(1) NOT NULL DEFAULT '0',
                      `id_import` bigint(10) NOT NULL,
                      `id_workspace` tinyint(4) unsigned NOT NULL DEFAULT '0',
                      `stat_valid` smallint(1) NOT NULL DEFAULT '0' ,
                      `add_manual` tinyint(1) unsigned NOT NULL DEFAULT '0' ,
                      `insert_by` varchar(12) NOT NULL,
                      `update_by` varchar(12) DEFAULT NULL,
                      `mark_as_duplicate` smallint(1) NOT NULL DEFAULT '0' ,
                      `mark_as_trash` smallint(1) NOT NULL DEFAULT '0' ,
                      `in_date_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
                      PRIMARY KEY (`id`),
                      KEY `ind_import` (`id_import`),
                      KEY `ind_duplicate` (`mark_as_duplicate`),
                      KEY `id_workspace` (`id_workspace`),
                      KEY `add_manual` (`add_manual`),
                      KEY `il` (`stat_valid`,`mark_as_trash`,`in_date_time`),
                      KEY `region` (`id_prov`,`id_city`,`id_district`,`id_town`,`tps`),
                      KEY `name` (`name`),
                      KEY `place_of_birth` (`place_of_birth`),
                      KEY `ind_birth` (`birthdate`(10)),
                      KEY `ind_sex` (`sex`(2))
                    ) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

CREATE TABLE `test_prov` (
          `id` smallint(2) NOT NULL,
          `name_prov` varchar(60) NOT NULL,
          `head_manager` varchar(60) DEFAULT NULL,
          `handprint` blob,
          `ibu_kota` varchar(60) DEFAULT NULL,
          `ketua_kpu` varchar(60) DEFAULT NULL,
          PRIMARY KEY (`id`),
          UNIQUE KEY `name` (`name_prov`)
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1



CREATE TABLE `test_city` (
          `id` smallint(2) NOT NULL,
          `id_prov` smallint(2) NOT NULL,
          `name_city` varchar(60) NOT NULL,
          `head_manager` varchar(60) DEFAULT NULL,
          `ibu_kota` varchar(60) DEFAULT NULL,
          `ketua_kpu` varchar(60) DEFAULT NULL,
          PRIMARY KEY (`id_prov`,`id`),
          KEY `name_city` (`name_city`)
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `test_district` (
          `id` smallint(2) NOT NULL,
          `id_city` smallint(2) NOT NULL,
          `id_prov` smallint(2) NOT NULL,
          `name_district` varchar(60) NOT NULL,
          `head_manager` varchar(60) DEFAULT NULL,
          `handprint` blob ,
          `ppk_1` varchar(60) DEFAULT NULL,
          `ppk_2` varchar(60) DEFAULT NULL,
          `ppk_3` varchar(60) DEFAULT NULL,
          `ppk_4` varchar(60) DEFAULT NULL,
          `ppk_5` varchar(60) DEFAULT NULL,
          PRIMARY KEY (`id_prov`,`id_city`,`id`),
          KEY `name_district` (`name_district`)
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `test_town` (
                      `id` smallint(4) NOT NULL,
                      `id_district` smallint(2) NOT NULL,
                      `id_city` smallint(2) NOT NULL,
                      `id_prov` smallint(2) NOT NULL,
                      `name_town` varchar(60) NOT NULL,
                      `handprint` blob,
                      `pps_1` varchar(60) DEFAULT NULL,
                      `pps_2` varchar(60) DEFAULT NULL,
                      `pps_3` varchar(60) DEFAULT NULL,
                      `tpscount` smallint(2) DEFAULT NULL,
                      `pps_4` varchar(60) DEFAULT NULL,
                      `pps_5` varchar(60) DEFAULT NULL,
                      PRIMARY KEY (`id_prov`,`id_city`,`id_district`,`id`),
                      KEY `name_town` (`name_town`)
                    ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

and the query like

SELECT `E`.`id`, `E`.`id_prov`, `E`.`id_city`, `E`.`id_district`, `E`.`id_town`, 
`B`.`name_prov`,`C`.`name_city`,`D`.`name_district`, `A`.`name_town`, `E`.`tps`,     `E`.`urut_xls`, 
`E`.`nik`,`E`.`name`,`E`.`place_of_birth`, `E`.`birth_date`, E.age, `E`.`sex`,  `E`.`marital_s`, `E`.`address`,`E`.`note` 
FROM peoples E
  JOIN test_prov B ON  E.id_prov = B.id
  JOIN test_city C ON E.id_city = C.id and (C.id_prov=B.id)
  JOIN test_district D ON E.id_district = D.id and ((D.id_city = C.id) and (D.id_prov= B.id))
  JOIN test_town A ON E.id_town = A.id and ((A.id_district = D.id) 
    and (A.id_city = C.id) and (A.id_prov = B.id)) AND E.stat_valid=1 AND E.mark_as_trash=0

mark_as_trash is a mark column which only contain 1 and zero just to know if the data has been mark as a deleted record, and stat_valid is the filtered result value – if value is 1 then the data is valid to get the rights of election.

I've tried to see the explain but no column is used as an index lookup. I believe that's the problem why the application so slow in 200K rows.
The query above only shows two conditions, but the application has a feature to filter by name, place of birth, birth date, age with ranges and so on.

How can I make this perform better?

Best Answer

It is difficult to say without seeing an execution plan, but I would take a look at your joins to the geography hierarchy. The geography tables (prov/city/district) have compound keys and all of the necessary foreign keys are already in your peoples table. You are currently joining partly directly and partly through the compound keys. This is a very unconventional approach. You should join on full keys not on partial compound keys. However, in you case you could probably simplify further. Instead of joining up the hierarchy why not just join directly from the bottom level to each piece of the hierarchy as in a star schema.

Try this join clause instead:

FROM peoples E
  JOIN test_prov B ON  E.id_prov = B.id
  JOIN test_city C ON  E.id_city = C.id 
                   and E.id_prov = C.id_prov
  JOIN test_district D ON  E.id_district = D.id 
                       and E.id_city = D.id_city
                       and E.id_prov = D.id_prov
  JOIN test_town A ON  E.id_town = A.id 
                   and E.id_distict = A.id_district 
                   and E.id_city = A.id_city
                   and E.id_prov = A.id_prov 
WHERE E.stat_valid=1 
  AND E.mark_as_trash=0

Note also that I've taken stat_valid and mark_as_trash out of the joins and put them in a where clause. Don't include non-key columns in your joins, it's bad form. Note too that these columns are not indexed, so you will be potentially be causing a table scan with these. I suspect that even if they were indexed they wouldn't be selective enough and you might end up with a table scan anyway.