Mysql – Help on improving the query performance of MySQL table

innodbMySQL

My team handles a tool that automatically detects and categorizes images.We have a Mysql DB (InnoDB engine) used by our tool in production where we store information about each image processed.

The table is poorly designed by someone long before I joined the team. It was all working well as there was very less data in Db till now. Recently we launched the tool to a wider audience and the Db is now having huge data. The 'select' query speed is very slow and takes days to get result.

I am not an expert in Databases. Please help on suggestions to improve the performance. I am thinking of options like creating indexes. We cannot have downtime for the system.Th Db is write heavy (around 200 insertion per second). I am afraid of creating table locks if i try out index creation.

Table schema:
no primary Key, no indexes

CREATE TABLE `ImageDetails` (
  `BOOKNAME` varchar(11) DEFAULT NULL,
  `PREV_ID` varchar(11) DEFAULT NULL,
  `TIMESTAMP` varchar(100) DEFAULT NULL,
  `IMAGE_FILENAME` varchar(1000) DEFAULT NULL,
  `POSITION_ID` varchar(1000) DEFAULT NULL,
  `PRECEDING_TEXT` varchar(3000) DEFAULT NULL,
  `WIDTH` float DEFAULT NULL,
  `HEIGHT` float DEFAULT NULL,
  `IMAGE_ORIENTATION` varchar(100) DEFAULT NULL,
  `COMPUTED_WIDTH` float DEFAULT NULL,
  `COMPUTED_HEIGHT` float DEFAULT NULL,
  `SRC_STYLE_WIDTH` varchar(100) DEFAULT NULL,
  `SRC_STYLE_HEIGHT` varchar(100) DEFAULT NULL,
  `LAYOUT_TYPE` varchar(30) DEFAULT NULL,
  `COVER` varchar(10) DEFAULT NULL,
  `TYPE` varchar(250) DEFAULT NULL,
  `TYPE_SCORE` float DEFAULT NULL,
  `CATEGORY` varchar(500) DEFAULT NULL,
  `CATEGORY_SCORE` float DEFAULT NULL,
  `HISTORIC_SCORE` float DEFAULT NULL,
  `QUALITY_SCORE` float DEFAULT NULL,
  `NEW_WIDTH_IN_PERCENT` float DEFAULT NULL,
  `NEW_MAX_WIDTH_IN_POINTS` float DEFAULT NULL,
  `NEW_HEIGHT_IN_PERCENT` float DEFAULT NULL,
  `NEW_MAX_HEIGHT_IN_POINTS` float DEFAULT NULL,
  `ORIGINAL_WIDTH` float DEFAULT NULL,
  `ORIGINAL_WIDTH_UNIT` varchar(100) DEFAULT NULL,
  `ORIGINAL_HEIGHT` float DEFAULT NULL,
  `ORIGINAL_HEIGHT_UNIT` varchar(100) DEFAULT NULL,
  `CONTAINER_HIERARCHY_TYPE` varchar(1000) DEFAULT NULL,
  `ERROR_CODE1` varchar(1000) DEFAULT NULL,
  `ERROR_CODE2` varchar(1000) DEFAULT NULL,
  `STATUS` varchar(1000) DEFAULT NULL,
  `CLIENT` varchar(100) DEFAULT NULL,
  `SAMPLE` tinyint(1) DEFAULT NULL,
  `T_ID` varchar(200) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

EDIT:

Mysql Engine version- 8.0.17

200 insert statements per second

The select query are of the form:

-- 1.
select count(BOOKNAME) 
from table 
where timestamp between time1 and time2.

-- 2.
select count(BOOKNAME) 
from table 
where (timestamp between time1 and time2) 
    and status="success";

-- 3.
select count(BOOKNAME) 
from table 
where (timestamp between time1 and time2) 
    and (type like "%landscape%" ) 
    and status="success";

Best Answer

You can try creating one single fully covering index, based on the queries you provided, on the fields timestamp, type, status, bookname. This should be applicable to each of the three queries above, but knowing if it's ideal or the best index would be dependent on your table's data and testing it. As Andrew points out, depending on how large the timestamp ranges are, and more so meaning depending on how many rows the typical range returns, will affect the selectivity of indexing on that column. But after you create the index, you should be able to test and run an EXPLAIN and ANALYZE and see if it's using the index appropriately.

Note for a 200 million row table, it might take a little bit to create the index, and you might want to do it during a maintenance window, off-hours. Also you should look to use proper data types and clean up your table when possible as well, for example timestamp should be stored in a timestamp type column, and does the status column really need up to 1,000 characters when there's only 6 distinct values for status?...these minor things could also affect performance in minor ways, and definitely affect maintainability, but this is more of a sidenote.