I have one table which stores input from user, so to allow national characters I've set charset to utf8mb4.
Then I have second table, which can grow really large (100+ millions of rows) and I'm 100% sure that this table will store only ASCII characters, so I've set charset for this table to latin1 to save disk space.
Is there any downside of having different charsets among tables in one database? Is it considered bad practice?
Or is there any other way how to resolve this issue? (to keep disk occupation minimal and allow national characters)
EDIT:
output of show create table
which takes user input (and has utf8mb4)
CREATE TABLE `sensor` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`label` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`description` varchar(3000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created` datetime(6) NOT NULL,
`state` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`region` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`model_reference` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `label` (`label`),
KEY `id` (`id`),
KEY `sensor_ibfk_1` (`model_reference`),
FULLTEXT KEY `fulltext_index`(`label`,`description`,`state`,`region`),
CONSTRAINT `sensor_ibfk_1` FOREIGN KEY (`model_reference`) REFERENCES `model` (`id`) ON DELETE SET NULL ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
output of show create table
which is autogenerated (and has latin1)
CREATE TABLE `sensor_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sensor_reference` int(20) DEFAULT NULL,
`temperature` varchar(16) DEFAULT NULL,
`pressure` varchar(16) DEFAULT NULL,
`at` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
PRIMARY KEY (`id`),
KEY `sensor_history_ibfk_1` (`sensor_reference`),
CONSTRAINT `sensor_history_ibfk_1` FOREIGN KEY (`sensor_reference`) REFERENCES `sensor` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Best Answer
Each column of each table of each database can have a different
CHARACTER SET
andCOLLATION
. There is no downside unless youJOIN
on a column that has different charset and/or collation.Ascii (7-bit stuff) is a subset of latin1 and of utf8mb4. As such, the encoding is identical, so no space difference will be seen for purely ascii text.
However, there are a few hiccups when you declare unnecessarily large strings, especially with wider character sets. Probably you won't hit any hiccups. Please provide
SHOW CREATE TABLE
for a review of likely cases.The client must declare what charset it is using, so
UTF-8
is what you would need there. In that case, any non-utf8mb4 columns would be automatically converted when going between client and server.More
When storing into
VARCHAR(...) CHARACTER SET utf8mb4
, there is a 1-byte or 2-byte length overhead, plusWith
CHARACTER SET latin1
:For clear sailing:
CHARACTER SET
that can hold the characters. (English will work with any character set, but Emoji will work only with utf8mb4.)SET NAMES
) must correctly say what encoding is used in the client.