Mysql – Two tables with two different charsets

encodingMySQL

I have one table which stores input from user, so to allow national characters I've set charset to utf8mb4.

Then I have second table, which can grow really large (100+ millions of rows) and I'm 100% sure that this table will store only ASCII characters, so I've set charset for this table to latin1 to save disk space.

Is there any downside of having different charsets among tables in one database? Is it considered bad practice?

Or is there any other way how to resolve this issue? (to keep disk occupation minimal and allow national characters)

EDIT:

output of show create table which takes user input (and has utf8mb4)

 CREATE TABLE `sensor` (
  `id` int(20) NOT NULL AUTO_INCREMENT,
  `label` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `description` varchar(3000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `created` datetime(6) NOT NULL,
  `state` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `region` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `model_reference` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `label` (`label`),
  KEY `id` (`id`),
  KEY `sensor_ibfk_1` (`model_reference`),
 FULLTEXT KEY `fulltext_index`(`label`,`description`,`state`,`region`), 
 CONSTRAINT `sensor_ibfk_1` FOREIGN KEY (`model_reference`) REFERENCES      `model` (`id`) ON DELETE SET NULL ON UPDATE NO ACTION
 ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4      COLLATE=utf8mb4_unicode_ci

output of show create table which is autogenerated (and has latin1)

CREATE TABLE `sensor_history` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `sensor_reference` int(20) DEFAULT NULL,
 `temperature` varchar(16) DEFAULT NULL,
 `pressure` varchar(16) DEFAULT NULL,
 `at` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
 PRIMARY KEY (`id`),
 KEY `sensor_history_ibfk_1` (`sensor_reference`),
 CONSTRAINT `sensor_history_ibfk_1` FOREIGN KEY (`sensor_reference`)     REFERENCES `sensor` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1

Best Answer

Each column of each table of each database can have a different CHARACTER SET and COLLATION. There is no downside unless you JOIN on a column that has different charset and/or collation.

Ascii (7-bit stuff) is a subset of latin1 and of utf8mb4. As such, the encoding is identical, so no space difference will be seen for purely ascii text.

However, there are a few hiccups when you declare unnecessarily large strings, especially with wider character sets. Probably you won't hit any hiccups. Please provide SHOW CREATE TABLE for a review of likely cases.

The client must declare what charset it is using, so UTF-8 is what you would need there. In that case, any non-utf8mb4 columns would be automatically converted when going between client and server.

More

When storing into VARCHAR(...) CHARACTER SET utf8mb4, there is a 1-byte or 2-byte length overhead, plus

  • "abc" (ascii) will take 3 bytes - each ascii character takes 1 byte,
  • "ÛØÎ" will take 6 bytes - most European chars take 1- or 2-bytes,
  • "José" will take 5 bytes - the "é" takes 2 bytes,
  • "???" (Emoji) will take 12 bytes; each Emoji is 3 or 4 bytes. Ditto for Chinese.
  • "$€" (currency symbols) will take 4 bytes -- 1 for "$" plus 3 for Euro.

With CHARACTER SET latin1:

  • "abc", "ÛØÎ", "José", "$€" can be correctly stored in latin1, using 3,3,4,2 bytes, respectively.
  • Emoji, all Asian languages, Greek, Cyrillic, and others cannot be represented in latin1. Garbage ensues. The particular garbage depends on the various settings; this Q&A lists common garblings.

For clear sailing:

  • The column must be declared (explicitly or implicitly) with a CHARACTER SET that can hold the characters. (English will work with any character set, but Emoji will work only with utf8mb4.)
  • The connection parameters (or SET NAMES) must correctly say what encoding is used in the client.
  • There must be a mapping between the two charsets. (English always maps. (Western Europe can map among latin1, utf8, utf8mb4, but not much else.)