* 爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
一、问题背景
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='
二、问题模拟
mysql> show create table t1\G;
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`name1` varchar(12) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)
mysql> show create table t2\G;
*************************** 1. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`name2` varchar(12) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql> CREATE VIEW t3 as select * from t1,t2 where `t1`.`name1`= `t2`.`name2`;
Query OK, 0 rows affected (0.06 sec)
mysql> select * from t3;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='
三、问题分析
通过查看视图定义,可以发现由于视图中涉及到的两张表字符集不同,所以创建视图时 MySQL 会自动使用 convert 函数转换字符集。
mysql> show create view t3\G;
*************************** 1. row ***************************
View: t3
Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `t3` AS select `t1`.`name1` AS `name1`,`t2`.`name2` AS `name2` from (`t1` join `t2`) where (`t1`.`name1` = convert(`t2`.`name2` using utf8mb4))
character_set_client: utf8mb4
collation_connection: utf8mb4_general_ci
1 row in set (0.00 sec)
在 MySQL 8.0 中 utf8mb4 的默认排序规则为 utf8mb4_0900_ai_ci ,而在 t1 表的排序规则为 utf8mb4_general_ci ,那么我们试着将排序规则相关的参数修改后再执行 SQL 看看,修改后的环境参数如下
mysql> show variables like '%collat%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_bin |
| collation_server | utf8mb4_bin |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
再次执行 sql 发现还是会报一样的错。
mysql> select * from t1,t2 where `t1`.`name1`=convert(`t2`.`name2` using utf8mb4);
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='
通过 show collation 来查看 utf8mb4 字符集对应的默认排序规则,输出显示默认规则为 utf8mb4_general_ci ,并不是 utf8mb4_0900_ai_ci 。
mysql> show collation like '%utf8mb4%';
+----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | PAD SPACE |
+----------------------------+---------+-----+---------+----------+---------+---------------+
mysql> show character set like '%utf8mb4%';
+---------+---------------+--------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
+---------+---------------+--------------------+--------+
1 row in set (0.00 sec)
继续排查发现元数据中的字符集默认排序规则如下,默认规则为 utf8mb4_0900_ai_ci 。
mysql> select * from INFORMATION_SCHEMA.COLLATIONS where IS_DEFAULT='Yes' and CHARACTER_SET_NAME='utf8mb4'\G;
*************************** 1. row ***************************
COLLATION_NAME: utf8mb4_0900_ai_ci
CHARACTER_SET_NAME: utf8mb4
ID: 255
IS_DEFAULT: Yes
IS_COMPILED: Yes
SORTLEN: 0
PAD_ATTRIBUTE: NO PAD
1 row in set (0.00 sec)
检查参数发现,元数据信息中 utf8mb4 字符集默认排序规则是 utf8mb4_0900_ai_ci ,show collation/show character 输出的都是 utf8mb4_general_ci 。为什么 show 显示的结果和 INFORMATION_SCHEMA.COLLATIONS 表查到的信息还不一样呢?此处我们暂且按下不表,咱们先看看官方文档中 convert 函数用法,其中有下面这段原文:
If you specify CHARACTER SET charset_name as just shown, the character set and collation of the result are charset_name and the default collation of charset_name. If you omit CHARACTER SET charset_name, the character set and collation of the result are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.4, “Connection Character Sets and Collations”).
将 convert 函数指定为 t1.name1 字段的排序规则后,sql 执行正常。
mysql> select * from t1,t2 where `t1`.`name1` = convert(`t2`.`name2` using utf8mb4) collate utf8mb4_general_ci;
+-------+-------+
| name1 | name2 |
+-------+-------+
| jack | jack |
+-------+-------+
1 row in set (0.00 sec)
另外,下面测试可以验证 default_collation_for_utf8mb4 的第四个场景。
mysql> select * from INFORMATION_SCHEMA.COLLATIONS where IS_DEFAULT='Yes' and CHARACTER_SET_NAME='utf8mb4'\G;
*************************** 1. row ***************************
COLLATION_NAME: utf8mb4_0900_ai_ci
CHARACTER_SET_NAME: utf8mb4
ID: 255
IS_DEFAULT: Yes
IS_COMPILED: Yes
SORTLEN: 0
PAD_ATTRIBUTE: NO PAD
1 row in set (0.00 sec)
mysql> show variables like '%default_collation%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
1 row in set (0.01 sec)
mysql> set @s1 = _utf8mb4 'jack',@s2 = _utf8mb4 'jack';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @s1 = @s2;
+-----------+
| @s1 = @s2 |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
_utf8mb4声明的@s1和@s2排序规则是default_collation_for_utf8mb4参数值,为utf8mb4_general_ci
mysql> SELECT @s1 = CONVERT(@s2 USING utf8mb4);
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='
此时,经过CONVERT函数处理的@s2排序规则是utf8mb4_0900_ai_ci,所以会报错
mysql> SELECT @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci;
+-------------------------------------------------------------+
| @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci |
+-------------------------------------------------------------+
| 1 |
+-------------------------------------------------------------+
1 row in set (0.00 sec)
四、问题总结
创建数据库实例时需指定参数 character_set_database(默认值:utf8mb4),character_set_server(默认值:utf8mb4)。
当需要创建非默认字符集 database / table 时,需要在 sql 中明确指定字符集和排序规则。
使用 convert 函数转换字符集时,当字段排序规则不是转换后字符集的默认排序规则,需要指定具体的排序规则。SELECT @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci
MySQL 5.7 迁移至 MySQL 8.0 时,需注意 MySQL 5.7 版本中 utf8mb4 默认排序规则是 utf8mb4_general_ci ,MySQL 8.0 中 utf8mb4 默认排序规则是 utf8mb4_0900_ai_ci 。
