Mariadb – Does any MariaDB engine have the option to declaratively or configuratively abstract highly-redundant data

denormalizationjsonmariadbnosqlstorage-engine

I have an application which is architected in a "NoSQL" style around one fully denormalized "main table" which currently just holds a primary key and one JSON-valued column. For reasons which are outside the scope of this question, I want to retain this architecture: I do NOT want to go full relational and create a proper normalized data model with entities, foreign keys, etc. I would be willing to parse out the JSON fields into their own columns, if that helps with the question below.

The data has a very high level of redundancy, e.g. some long-text column values such as names, addresses, and descriptions may be repeated 20 or 30+ times across different rows.

Is there a MariaDB engine which can declaratively or configuratively deal with this type of redundancy by internally de-duplicating the values across rows?

I imagine different ways this might be implemented in MariaDB would involve an extension to the BLOB or TEXT data types to leverage a content-based hash along with reference-counting or garbage collection.

What is meant by declaratively: it can be accomplished via a single "ALTER TABLE" statement. What is meant by configuratively: it can be accomplished by modifying one or more system variables.

I have read about ColumnStore, TokuDB, Mroonga, Parquet, and MyRocks but I'm not quite sure this is what they do, as the documentation is very sparse.

This question is specifically about a storage engine to help internally optimize the storage of redundant data. Please refrain from telling me to rearchitect, redesign, or refactor the application.

Best Answer

In any RDBMS, "normalization" is an explicit task that the programmer does.

Having stuff buried inside JSON strings makes that task more difficult.