MySQL MHA切换失败一例


先看下引起问题的密码啥样, 包含两个特殊字符[和~.

$ egrep -w 'user|password' /etc/masterha/app1.cnf

password=P[AI3M~5z

user=mha_mgr


用户mha_mgr的作用, 如下文档中的说明, 可见其对数据库实例起到管理的作用.

MySQL administrative database username to the target MySQL server. This should be root because it runs all necessary administrative commands such as STOP SLAVE, CHANGE MASTER, RESET SLAVE.



看看两种场景下切换失败的现象.


1. 手动在线切换, 据日志可看到, MHA将密码中的特殊字符转义了, 致使用户mha_mgr连接数据库失败.


检查原有的主从复制状态, 发现无变化, 即切换失败, 并没造成影响.


$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --orig_master_is_new_slave

...

Thu Jan 11 10:28:39 2018 - [info] * Phase 2: Rejecting updates Phase..

Thu Jan 11 10:28:39 2018 - [info]

Thu Jan 11 10:28:39 2018 - [info] Executing master ip online change script to disable write on the current master:

Thu Jan 11 10:28:39 2018 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.4.42 --orig_master_ip=192.168.4.42 --orig_master_port=3306 --orig_master_user='mha_mgr' --orig_master_password='P\[AI3M\~5z' --new_master_host=192.168.4.43 --new_master_ip=192.168.4.43 --new_master_port=3306 --new_master_user='mha_mgr' --new_master_password='P\[AI3M\~5z' --orig_master_ssh_user=mysql --new_master_ssh_user=mysql   --orig_master_is_new_slave

ARGS: $VAR1 = [

          '--command=stop',

          '--orig_master_host=192.168.4.42',

          '--orig_master_ip=192.168.4.42',

          '--orig_master_port=3306',

          '--orig_master_user=mha_mgr',

          '--orig_master_password=P\\[AI3M\\~5z',

          '--new_master_host=192.168.4.43',

          '--new_master_ip=192.168.4.43',

          '--new_master_port=3306',

          '--new_master_user=mha_mgr',

          '--new_master_password=P\\[AI3M\\~5z',

          '--orig_master_ssh_user=mysql',

          '--new_master_ssh_user=mysql',

          '--orig_master_is_new_slave'

        ];

Unknown option: orig_master_ssh_user

Unknown option: new_master_ssh_user

Unknown option: orig_master_is_new_slave

Got Error: DBI connect(';host=192.168.4.43;port=3306;mysql_connect_timeout=4','mha_mgr',...) failed: Access denied for user 'mha_mgr'@'192.168.4.45' (using password: YES) at /usr/local/share/perl5/MHA/DBHelper.pm line 205.

 at /usr/local/bin/master_ip_online_change line 132.


Thu Jan 11 10:28:39 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53.


2. 自动故障切换(手动关闭主库, 模拟故障), 日志最后出现了"completed successfully"的字样, 貌似没问题, 但详细查看日志, 发现一处和上面类似的报错, 见标黄处.


检查原有的复制状态, 发现复制关系切换正常, 但是写虚拟IP不见了, 即不可写了. 


$ tail -f /var/log/masterha/manager.log

...

Fri Jan 12 10:33:55 2018 - [info] Executing master IP activate script:

Fri Jan 12 10:33:55 2018 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=mysql --orig_master_host=192.168.4.42 --orig_master_ip=192.168.4.42 --orig_master_port=3306 --new_master_host=192.168.4.43 --new_master_ip=192.168.4.43 --new_master_port=3306 --new_master_user='mha_mgr' --new_master_password='P\[AI3M\~5z'

DBI connect(';host=192.168.4.43;port=3306;mysql_connect_timeout=4','mha_mgr',...) failed: Access denied for user 'mha_mgr'@'192.168.4.45' (using password: YES) at /usr/local/share/perl5/MHA/DBHelper.pm line 205.

 at /usr/local/bin/master_ip_failover line 81.

Fri Jan 12 10:33:55 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1588]  Failed to activate master IP address for 192.168.4.43(192.168.4.43:3306) with return code 10:0

Fri Jan 12 10:33:55 2018 - [warning] Proceeding.

Fri Jan 12 10:33:55 2018 - [info] Setting read_only=0 on 192.168.4.43(192.168.4.43:3306)..

Fri Jan 12 10:33:55 2018 - [info]  ok.

...

Master failover to 192.168.4.43(192.168.4.43:3306) completed successfully.

Fri Jan 12 10:33:57 2018 - [info] Sending mail..



解决方法, 换个比较正常的密码就行了. 至于修复这个小缺陷, 还要仔细查下代码.