Mysql – Define JSON Index for Schemaless using MySQL 8.0

database-designindexjsonMySQL

We are using MySQL 8.0 and need to implement Schemaless Feature where we allow Vendors to create a Module(ObjectName) and add columns(int, string, currency) to it. Vendor can add objects after defining the Module.

CREATE TABLE `ObjectName` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  `vendor_Id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_UNIQUE` (`id`),
  KEY `vendor_Id_idx` (`vendor_Id`),
  CONSTRAINT `vendor_Id` FOREIGN KEY (`vendor_Id`) REFERENCES `Vendor` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 
COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `ObjectNameColumn` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `objectname_id` int(11) DEFAULT NULL,
  `name` varchar(45) DEFAULT NULL,
  `dataType` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `objectnamecolumn_id_objectname_id_idx` (`objectname_id`),
  CONSTRAINT `objectnamecolumn_id_objectname_id` FOREIGN KEY (`objectname_id`) 
REFERENCES `ObjectName` (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=21 DEFAULT CHARSET=utf8mb4 
COLLATE=utf8mb4_0900_ai_ci



CREATE TABLE `object_json` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `attributes` json DEFAULT NULL,
  `objectname_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `objectname_id_objectnameidjson` (`objectname_id`),
  CONSTRAINT `objectname_id_objectnameidjson` FOREIGN KEY (`objectname_id`) 
REFERENCES `ObjectName` (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=400002 DEFAULT CHARSET=utf8mb4 
    COLLATE=utf8mb4_0900_ai_ci

ObjectName :

id | name     | vendor_id
1  | Contact  |   1
2  | Account  |   1
3  | Contact  |   2
4  | Account  |   2

ObjectNameColumn :

id | objectname_id | name              | dataType
 1 |      1        | height            |  int
 2 |      1        | weight            |  int
 3 |      1        | age               |  int
 4 |      1        | name              |  string
 5 |      1        | mobile_number     |  string
 6 |      2        | annual_revenue    |  int
 7 |      2        | establised_year   |  int
 8 |      2        | numbe_of_employees|  int
 9 |      2        | address           |  string
10 |      2        | name              |  string

object_json :

    id | objectname_id | attributes
     1 | 1             |{"age": 3, "name": "jiten1", "height": 1, "weight": 2, "mobile_number": null}
     2 | 2             |{"name": "xyz", "address": null, "annual_revenue": 1, "established_year": 2, "number_of_employees": 3}

Now, let's say I want to look for lookup for Contacts of Vendor 1 which satisfy following criteria :

height between 200 and 400
weight between 400 and 800
age between 400 and 800

SELECT
   object_json.id AS id,
   attributes 
FROM
   object_json 
where
   object_json.objectname_id = 1 
   AND attributes -> " $ .height" between 200 and 400 
   AND attributes -> " $ .weight" between 400 and 800 
   AND attributes -> " $ .age" between 400 and 800

The Vendor Can do Select operations with the filters as JSON fields. When we run these queries with 100,000 records for 4 ObjectNames

enter image description here

Output of EXPLAIN :

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "21938.55"
    },
    "table": {
      "table_name": "object_json",
      "access_type": "ref",
      "possible_keys": [
        "objectname_id_objectnameidjson"
      ],
      "key": "objectname_id_objectnameidjson",
      "used_key_parts": [
        "objectname_id"
      ],
      "key_length": "4",
      "ref": [
        "const"
      ],
      "rows_examined_per_scan": 192228,
      "rows_produced_per_join": 192228,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "2715.75",
        "eval_cost": "19222.80",
        "prefix_cost": "21938.55",
        "data_read_per_join": "5M"
      },
      "used_columns": [
        "id",
        "attributes",
        "objectname_id"
      ],
      "attached_condition": "((json_extract(`test`.`object_json`.`attributes`,'$.height') between 200 and 400) and (json_extract(`test`.`object_json`.`attributes`,'$.weight') between 400 and 800) and (json_extract(`test`.`object_json`.`attributes`,'$.age') between 400 and 800))"
    }
  }
}

We want to speed up the query using indexes.
We found there is an option for Secondary Index by adding Virtual Columns and then defining an index on the Virtual Columns in case of JSON type.

  1. Since Vendor can search on all attributes of an ObjectName –> how do we go about defining index in such a case

  2. Also, in our case, since we don't know what kind of ObjectName(entities) the Vendor will add –> and I don't think it is a good idea to define Indexes on a table at run time, the approach does not seem to be scalable. How can we go about solving this problem when we don't know the nature of entities.

Best Answer

I recommend you collect the queries that users use. Periodically go through them to see what json 'columns' are most commonly filtered on and build indexes (with Virtual, etc).

Be sure to also look for pairs of columns that are filtered on. In this case, be sure to put the = column first. In your example, each of the filters is a 'range', so there is no benefit in building a 'composite' index.

I touch on some of this here .