by default, MySQL does not consider the case of the strings
This is not quite true. Whenever you create database
in MySQL, the database/schema has a character set and a collation. Each character set has a default collation; see here for more information.
The default collation for character set latin1
, which is latin1_swedish_ci
, happens to be case-insensitive.
You can choose a case-sensitive collation, for example latin1_general_cs
(MySQL grammar):
CREATE SCHEMA IF NOT EXISTS `myschema`
DEFAULT CHARACTER SET latin1
COLLATE latin1_general_cs ;
This has an effect on things like grouping and equality. For example,
create table casetable (
id int primary key,
thing varchar(50)
);
select * from casetable;
+----+-------+
| id | thing |
+----+-------+
| 3 | abc |
| 4 | ABC |
| 5 | aBc |
| 6 | abC |
+----+-------+
In a case-sensitive database, we get:
select thing, count(*) from casetable group by thing;
+-------+----------+
| thing | count(*) |
+-------+----------+
| ABC | 1 |
| aBc | 1 |
| abC | 1 |
| abc | 1 |
+-------+----------+
select * from casetable where thing = "abc";
+----+-------+
| id | thing |
+----+-------+
| 3 | abc |
+----+-------+
While in a case-insensitive database, we get:
select thing, count(*) from casetable group by thing;
+-------+----------+
| thing | count(*) |
+-------+----------+
| abc | 4 |
+-------+----------+
select * from casetable where thing = "abc";
+----+-------+
| id | thing |
+----+-------+
| 3 | abc |
| 4 | ABC |
| 5 | aBc |
| 6 | abC |
+----+-------+
Note that you can also change the collation from within a query. For example, in the case-sensitive database, I can do
select * from casetable where thing collate latin1_swedish_ci = "abc";
+----+-------+
| id | thing |
+----+-------+
| 3 | abc |
| 4 | ABC |
| 5 | aBc |
| 6 | abC |
+----+-------+
Best Answer
Simple answer: No need to specify the collation in the query. DDL level collation (especially when it's the same on both sides and both sides are columns) will be used (which is why we specify it at the DDL level in the first place).
Detailed answer: there is a hierarchy of precedence for which collation to use in a given operation (concatenation, predicate, etc). The collation used not only depends on whether it's a column vs a literal, etc, but also whether it's Unicode vs non-Unicode, and even binary vs non-binary. The full description can be found here, Collation Coercibility in Expressions, which is the MySQL 5.7 documentation since you are using that version. One rather interesting rule is:
All of that being said, I need to point out that the statement "binary collations are case-sensitive", while extremely common, is definitely incorrect.
On a very basic level, sorting is different. Case-sensitive collations will sort "a" with "A" (though which one comes first can depend on the culture), "b" with "B", and so on. Binary collations will sort according to the underlying value / Code Point of each character, which becomes very apparent when uppercase and lowercase versions of a character separated by others according to their value.
Being case "sensitive" means that you can also be "INsensitive" to other properties of characters, the main one being accents. That is also the only other property when it comes to non-Unicode character sets, but Unicode allows for Kana type sensitivity, width sensitivity, and SQL Server (as of version 2017) even allows for variation selector sensitivity. Binary collations do not allow for a character to equal anything other than itself, even if other forms of it exist. Again, this doesn't really happen so much in non-Unicode character sets, but in Unicode there can be several versions of a character, including wide, superscript, subscript, italics, upside down (referred to as "turned"), etc. MySQL, starting in version 8.0 (as far as I can tell), is adding more variations to the sensitivity options (at least for the
ut8mb4
character set and collations):To illustrate both of these points, I have set up a demo on the awesome db<>fiddle. I had to use MySQL 8.0 not only to get the
_ai_ci
collation, but also because theCOLLATE latin1_general_ci
clause (2nd to last query, #5) had no affect (for some odd reason; documentation states that when collation name has only_ci
, then_ai
is implied, yet for both MySQL 5.6 and 5.7 on db<>fiddle, the behavior is still_as_cs
or_bin
).AND, there are even other ways in which binary collations are not "case-sensitive". They cannot account for:
I did not list or provide examples for these earlier because they do not pertain to 8-bit encodings, and
latin1
is an 8-bit encoding. These are features of Unicode, and so should apply to any Unicode collation (though I have not tested them on MySQL, but they are implemented correctly in SQL Server).P.S. I have a detailed explanation of all of this (including the Unicode-specific behaviors noted directly above) in the following post: No, Binary Collations are not Case-Sensitive. That is currently framed only in the context of SQL Server, but I can work in the example that I came up with for this answer when I have time. What's important is that the concept is the same across RDBMSs.
For reference (just in case db<>fiddle is inaccessible), the queries are:
Query 1
Query 2
Query 3
Query 4
Query 5
Query 6
Query 7
Query 8
Query 9
Query 10