Mysql – How to improve performance of frequently updated table in thesql

MySQLperformancequery-performance

my table structure::

CREATE TABLE `gps_device_data`(
  `imei_no` bigint(15) NOT NULL DEFAULT '0',
  `ioelementpriority` int(1) DEFAULT NULL,
  `longitude` int(15) DEFAULT NULL,
  `latitude` int(15) DEFAULT NULL,
  `speed` tinyint(3) unsigned DEFAULT NULL,
  `iovalue` varchar(254) DEFAULT NULL,
  `odom` bigint(20) DEFAULT NULL,
  `alarm_category_realtime_id` varchar(70) DEFAULT NULL,
  `angle` int(4) DEFAULT NULL,
  `altitude` int(4) DEFAULT NULL,
  `satellites` int(4) DEFAULT NULL,
  `data_received_time` bigint(20) NOT NULL DEFAULT '0',
  `continue_data_received_time` bigint(20) NOT NULL DEFAULT '0',
  `movement` int(2) DEFAULT NULL,
  `eventsource` int(4) DEFAULT NULL,
  `data_location` varchar(250) DEFAULT NULL,
  `data_poi` varchar(100) DEFAULT NULL,
  `data_validity` enum('valid','invalid') DEFAULT NULL,
  `createdby` varchar(10) DEFAULT NULL,
  `createddate` datetime DEFAULT NULL,
  `modifiedby` varchar(10) DEFAULT NULL,
  `modifieddate` datetime DEFAULT NULL,
  `deletedby` varchar(10) DEFAULT NULL,
  `deleteddate` datetime DEFAULT NULL,
  PRIMARY KEY (`imei_no`,`data_received_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

query that takes too much time to generate report for vehicle.

select  odom,latitude/10000000 as latitude,longitude/10000000 as longitude,
       imei_no,iovalue,data_received_time,continue_data_received_time,
        data_location
    from  gps_device_data
    where  imei_no IN (7781112889,70433357641,356173061134123,
               865190010626029,354188046466178,356307040285917,863354010100132,
               863354010100107,8511139808,4209501721,9974052815,356307044188414,
               5252563256,7709111078,7043335764,2121212121,7043333691,
               7043333684,7043338378,100345,356307048930860,356307040328584,
               358899052926188,7621013796,7043132917,7043312765,9016102493,
               ...
               7820046194,8585858585,559862234897,123456789001,78000000021,
               2220002220,12345678911,863771023760608,863771023758834,
               9898787854,140507978,140908877,140507779,356173061103755,
               7485961230,454545454,147852390,1203203695,358174067325903
                      )
      and  data_received_time >= 1430463600000
      and  data_received_time <= 1446274800000
    Order by  imei_no,data_received_time;

my explain plan

+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | gps_device_data | range | PRIMARY       | PRIMARY | 16      | NULL | 629357 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+-------------+

what my project do is track the vehicle get its information and display that information to specific owner of vehicle.

questions:

Are there any alternative queries which gives me performance than above specified query?

Is it ok to use a Composite key with IN block?

my select query takes 25.93 seconds to execute. If my above select query takes 2 or 3 seconds to execute its good form me. How can I reduce the time for the select query?

I have 10 millions rows in the gps_device_data table.

Best Answer

You have two "ranges" filtering the data. MySQL has no good way to do that.

First, let's check some numbers.

  • How many rows in the result set?
  • How many rows without the IN clause?
  • How many rows without the time range?
  • Is this IN list and this time range typical?

With those, we may be able to decide to focus on filtering on the time range in preference over the IN range.

Meanwhile, let's get rid of the "midnight" bug... Your time range seems to be from the morning of May 1 through midnight of the morning of Oct 31. Perhaps you meant to include all of Oct 31, not just one millisecond of that day? Did you mean to ask for a full 6 months? Suggest this coding style; it's less error-prone:

AND data_received_time >= 1000 *  UNIX_TIMESTAMP('2015-05-01')
AND data_received_time <  1000 * (UNIX_TIMESTAMP('2015-05-01') + INTERVAL 6 MONTH)

Back to the performance question...

If you are looking at a smallish fraction of the total time, then PARTITIONing may be useful for simulating a 2D index. (I mention this as Case #2 in http://mysql.rjweb.org/doc.php/partitionmaint )

If you break the table into, say, 20-40 partitions PARTITION BY RANGE(data_received_time), then "partition pruning" will first filter on the time range. Then your PRIMARY KEY, with imei_no first, filters on the other dimension.

I cannot say how much (if any) this partitioning will speed things up. I would expect small time ranges to benefit (percentage-wise) the most. A time range that hits all partitions will actually be a little slower than without partitioning. (Hence, my "check some numbers" questions.)

Schema notes... (Smaller is faster)

  • BIGINT is 8 bytes. Can odom go beyond 16M miles, the limit for `MEDIUMINT UNSIGNED (3 bytes)?

  • INT(4) -- the (n) means nothing; suggest TINYINT (1 byte) or SMALLINT (2 bytes).

  • Some of the VARCHARs could be 'normalized' to say a 2-byte SMALLINT UNSIGNED.