准备内存修改ssh 公钥互信准备ssh连接调过keys处理提前修改主机名创建主从创建管理用户备份恢复数据库到从节点修改从节点参数开始同步slave节点设置Slave上的relay_log配置mha主配置文件与检测mha perl配置vip确认mha 工具包中脚本默认路径及检测mha check 问题处理配置supervisor配置2binlog_backup_app1.shsupervisor服务配置supervisorctl使用帮助,supervisord日志查看下面是手动安装supervisord的操作mha_healthy_check脚本只读的说明只读(锁库锁表解锁)innodb_read_onlytransaction_read_only(tx_read_only )锁定解锁主从问题处理errno 1062 errno 1032断电导致数据同步问题处理方案确认数据一致12361236问题过程一致性检测MHA相关脚本/etc/masterha/app1.conf日志状态查看问题处理
主节点不涉及重启,在线修改+修改my.cnf,不需要重启。
从节点配置过程中需要重启,直接修改my.cnf,重启。
xxxxxxxxxx101SET GLOBAL innodb_buffer_pool_size=402653184;2SET GLOBAL innodb_buffer_pool_size=225769803776; #设置24G3SELECT @@innodb_buffer_pool_size; #查询4SELECT @@innodb_buffer_pool_chunk_size;56修改配置文件的调整方法,修改my.cnf配置示例,缺省单位kb:7innodb_buffer_pool_size = 2147483648 #设置2G8innodb_buffer_pool_size = 225769803776 #设置24G9innodb_buffer_pool_size = 2G #设置2G10innodb_buffer_pool_size = 500M #设置500M
xxxxxxxxxx1461#无交互生成ssh key2ssh-keygen -t rsa -b 4096 -f /root/.ssh/id_rsa_mha -N "" -C "mysql mha use only"345id_rsa 6-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=78mkdir /root/.ssh9cat << EOFEOF > /root/.ssh/id_rsa_mha10-----BEGIN RSA PRIVATE KEY-----11MIIJKwIBAAKCAgEAuCSK4kBryryAlzYQBTxdjojC9kPMTFh2rFRJ56qMNzBsDYh212y/nypSmairoWDdAUgy6yh/MwXh2AFwrhTCSpCDBIPQaSPP6Dt03vdYu8cuQbD5wT13BWVWDMmc1qJQulSeAhJu/d2c8iy7+9vnu+LAtZLi2sOahxwIunOcdWI4VGkGpE+114+3OPT4rZD1rO8DSMyachyc1bX6eX+Fj/5VQYgWqmZb1Ht2EAmEpUTutvTPmydjsO15MKRbZIzD4sFRFHKk9PFbDnNe+rce6EJ4e38F2nsRjJXsbRKF1Z/lXthYtz7Hftfx16IHqer1wg7RdNnQqzuHTl1mlKjVgf3XWqbu6rt53EGyUnI2AxJdE7t9qS13VLeJbK172SZ3mchcF5N6CI5lp9owazqfZrG3rvJ5oRNYOBtimR/Lm3BtvNbdundgXGUtFlco18D9JiJNDKNpBONJ8D3oaeHXj74Rblh8ta8U8SJS8DBbqEegdY2BWC4QZGvjwhi3Ra19iQ2IX+22wwE4zq3PeKzdMaR+TMB19tWZwb2AdmFL5cwxt2Zoc3EYAB3S4srNejkT20K76TG3+18me2WcUnTM9y8cwaZ1HN0PPg3N4/cRyNGtWQ8p0Sje3kZ0QLxw9BPTx3213mnW37FOP14uTgBFf1rHY/GyTcCk/Q5Ox8g48Jmou9+GxEmDLsxhZNTQkM0CAwEA22AQKCAgEApn9cGHD5RemlN5QJbkKDsVIVPkhVHB7Mxsq7vD2n5Gri+GWQTKSmchIp23CxBwynJ/aZG6mVQZQq/X1n3e8lMfhyjo2Eztwzq56+8K+jSr4xI/KaEMTc+vu4Y824SpGe/qtt5x+1vdM4inxKSNMwN+EyL7stZXKBcXK2rafFKxnxJtmDyOPN/xsg1GIO25G2UXQXeyaM62rVoCKX5G2aXXUwrz0WiWJd28T6HkwM5MZsNX8ri/VUnWbcLeZA8n26OwtifWOfpV3p4CoPEhCrYIRhngBfUYn/dydEC+c4ncyiv1xM8vtPtHAaNSzJ+HML27znRImBtKHxmLi4/OefjZbwtqXD1vguAbVjySypQzVDWWIZbyv4ZtjtWn0Wv86poL282/P8v9mm3BztcmNdW8R9P9wKqEZdd5ZS5Ev7SWYL/3QqmxkjbM2nB6s5cvafRMNX29obbP97EoHu0VRzYMRKZaHY97MP7H99BcH+sZ/4hLRVzLDCaMJvdyIAgHoouLg8xj30s2xe+iI9VSSAPNk9C+sVQCv1bt2T9+JPf+hJvCtw4eJ9KMD1t8BVccH6AEtPzNxA31Q8qTdFy7eQ47MvJFpYed6jmUqWYate76PxyL8RvrE7xyKuoAzy9bMpE74sbBYW9J32Sn+RPk4QxBSR5PSbl5/PW83S+CungHzgsgDaIUNKKtsX05DMKCECggEBAOhI+6oX33dVizKqd/+mtMUEzRkD9aNKlyImZhxspOxhxFl3Sh8/0oQ0aUcorYxdz0GOvt+e5k3435vLolF3VErB6SwLiFwkNift72mUxfX4QghTLTto7xRyYzeiG7DFmnYBkjRA16ci35i9vhZLx+ZpExOqaAsKFB7KeJcgAjdEZ/CaTJ9Sw4e9PiRw/ir0auaMkruOLUfRur36IjTZzF1fCJYMrMmb9ZJivknh4ZWtN9/3a/EQB6iqSqcVn3odwtegx/Q6MTWk2sXX37tri2RWc1i2kLbdFU2Vfy2gULZ62PnNpbpRGq/u/FLzmT0Zq0itFi39d62tJp0Q+538T3vJ6FwC6czrBwkCggEBAMrxTuhpw2JFqn4yB5cHN91zrUD1mX6P67ZDIKw89cCk39JCHTrYLFQ334GzLhl7SZ/F/X+WyqyIav5U358c60d0VU+eSQCNMO4/PUEl197bXr405mnkmsRX4nFgbaFvHd+qhmOfRHzdMeMZsiMALhZVN4i3CE5+k8mdPqNENMTrZcU1410NfDH2bndLjD/zY+ziA5f+Q+WOZ4BZhJRPbpQbUDU0PRc0TwFp1mmnB9c16DWq2042+ojuucrvtS2jyVpQdaOqiN3Y2TU0gLD2cg/uRvtbNIYq+PHy880w3vLEgcF363Gn43yRch+n1n2nJvvLvuVpphKF/+OA3M+M8KHiXgufSmyKUCggEBAMpWyOJhjNa3O3Ct44CnZtA4XnBsAo07bGtTB5+7A+7lOySzxN1zuu1uukZCKOsczG1O6vXgdL13ze5YYR45bFuyjmrPoPiikoo1uVc/QTICD4Mw8TXPMnOdMNBFZdmWR/aDZ9gco73zWeEvdZC846ipvvRifD8hiGVJ+cGrIEMEU4ZC4nvmnGyhbIBR7RV906EA0tCOCOOGzhpepjzqGH47JvOOFvWH5WKeoSO6n8bCHOn7nQtf60wBCwaAz2qQgilS5NrZ1gPklq3AOmqsW8iL482+SlEARI5Qq3gTLjvGxzrS6IpkG9slUqbvi4ytz4Z/P9LP+HBRZFFteblKFAQCfm49YcMe+AECggEBALoUFlD+INtrm09ImjL9uFiCJ1rHlu04E120ITIsVv1Cw6qMh6y/50KQpIws+dj9dV1dUfl6Du2xy8DVm5EuGAEVht+Q98jcrP/W1lh01YGOzUlxmDNfFP51z947F2qNEuwxVDXmJAsFWiesOOzkzl1F17ywbppibynO108mS+g/Grp/hLVBQPq+52wQRlGg3n+mILtttV23e+lwqyK9zltYy+tssZMkMEjQfXu0XnkTAY6r0ZkxqZIZwU53vKBTbF02JxdhV0tq+IZTVKht39oBle4FC5z1q/tU0/zRkasxSPRW+3fj63sdeasg543WhOUdRTTe8RpM6Xr1fgGEbmNQxwCLFtd80CggEBAL9eYg1KTt1apIvu47BEInHf55+F4BskqH9kyoI79S2CPtRW3eKDZkkM4wImqZdesnAgJnvqwlSoF/T6Breb3FjGew56nhve3FZga3pXN2KVs9s93eqXHQgcdBqTFIpDEGnK9Oxsr+o0FVdIiXmJXq3z+cBf57P1KGSSEdg+d5tfvsQIXbSHmCO0aYg3fx12psNCzrgQcWIpkfYhgeoXgxJ+gD4YqO58sL3k5mQyX3aMAAXt0KoEPLeSTLW3ZG2cPCRRujUqsgcqlCIYfD5yfbPosa9IDI9A59vuoSEJ51/aFwxtfPfLweEImc+76DSPDFu9HhGJ7wavZYs4pNhbwOMRXrDykDvXo=60-----END RSA PRIVATE KEY-----61EOFEOF6263chmod 600 /root/.ssh/id_rsa_mha6465-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=6667mkdir /root/.ssh68if [ -e "/root/.ssh/authorized_keys" ] ; then echo "File has existed ,do nothing " ; else touch /root/.ssh/authorized_keys ; echo "File isn't exist , touch it" ; fi69if [ -e "/root/.ssh/authorized_keys" ] ; then echo "#" >> /root/.ssh/authorized_keys ; sed -i -e "/PYje4zUtbYRyp4DxCROal/s/^/#/g" -e '$a''ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC4JIriQGvKvICXNhAFPF2OiML2Q8xMWHasVEnnqow3MGwNiHbL+fKlKZqKuhYN0BSDLrKH8zBeHYAXCuFMJKkIMEg9BpI8/oO3Te91i7xy5BsPnBMFZVYMyZzWolC6VJ4CEm793ZzyLLv72+e74sC1kuLaw5qHHAi6c5x1YjhUaQakT7X7c49PitkPWs7wNIzJpyHJzVtfp5f4WP/lVBiBaqZlvUe3YQCYSlRO629M+bJ2Ow4wpFtkjMPiwVEUcqT08VsOc176tx7oQnh7fwXaexGMlextEoXVn+Ve2Fi3Psd+1/Egep6vXCDtF02dCrO4dOXWaUqNWB/ddapu7qu3ncQbJScjYDEl0Tu32pLXdUt4lsrZJneZyFwXk3oIjmWn2jBrOp9msbeu8nmhE1g4G2KZH8ubcG281t26d2BcZS0WVygP0mIk0Mo2kE40nwPehp4dePvhFuWHy1rxTxIlLwMFuoR6B1jYFYLhBka+PCGLdFqJDYhf7bbDATjOrc94rN0xpH5MwHX21ZnBvYB2YUvlzDG3ZmhzcRgAHdLiys16ORMrvpMbf7XyZ7ZZxSdMz3LxzBpnUc3Q8+Dc3j9xHI0a1ZDynRKN7eRnRAvHD0E9PHfeadbfsU4/Xi5OAEV/Wsdj8bJNwKT9Dk7HyDjwmai734bESYMuzGFk1NCQzQ== mysql mha use only' /root/.ssh/authorized_keys; fi70chmod 600 /root/.ssh/authorized_keys71history -r7273-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=74757677ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC4JIriQGvKvICXNhAFPF2OiML2Q8xMWHasVEnnqow3MGwNiHbL+fKlKZqKuhYN0BSDLrKH8zBeHYAXCuFMJKkIMEg9BpI8/oO3Te91i7xy5BsPnBMFZVYMyZzWolC6VJ4CEm793ZzyLLv72+e74sC1kuLaw5qHHAi6c5x1YjhUaQakT7X7c49PitkPWs7wNIzJpyHJzVtfp5f4WP/lVBiBaqZlvUe3YQCYSlRO629M+bJ2Ow4wpFtkjMPiwVEUcqT08VsOc176tx7oQnh7fwXaexGMlextEoXVn+Ve2Fi3Psd+1/Egep6vXCDtF02dCrO4dOXWaUqNWB/ddapu7qu3ncQbJScjYDEl0Tu32pLXdUt4lsrZJneZyFwXk3oIjmWn2jBrOp9msbeu8nmhE1g4G2KZH8ubcG281t26d2BcZS0WVygP0mIk0Mo2kE40nwPehp4dePvhFuWHy1rxTxIlLwMFuoR6B1jYFYLhBka+PCGLdFqJDYhf7bbDATjOrc94rN0xpH5MwHX21ZnBvYB2YUvlzDG3ZmhzcRgAHdLiys16ORMrvpMbf7XyZ7ZZxSdMz3LxzBpnUc3Q8+Dc3j9xHI0a1ZDynRKN7eRnRAvHD0E9PHfeadbfsU4/Xi5OAEV/Wsdj8bJNwKT9Dk7HyDjwmai734bESYMuzGFk1NCQzQ== mysql mha use only787980ssh-copy-id -i .ssh/id_rsa_mha root@10.10.101.5381ssh-copy-id -i .ssh/id_rsa_mha root@10.10.101.5482ssh-copy-id -i .ssh/id_rsa_mha root@10.10.101.418384ssh-copy-id -i .ssh/id_rsa_mha root@20.99.46.3685ssh-copy-id -i .ssh/id_rsa_mha root@10.10.101.5486ssh-copy-id -i .ssh/id_rsa_mha root@10.10.101.418788# Host表示一个段,Hostname可以是域名、ip89# StrictHostKeyChecking no 不需要验证对方指纹90# IdentityFile /root/.ssh/id_rsa_mha # 完整路径,是必须的,91# 如果masterha_check_ssh检测不通过,可以用ssh hostname/ip -v来显示详细。9293cat >> /root/.ssh/config << EOF94Host *95 Hostname %h96 User root97 Port 2298 StrictHostKeyChecking no99 IdentityFile /root/.ssh/id_rsa_mha100EOF101history -r102chmod 600 /root/.ssh/config103104scp /root/.ssh/config m:/root/.ssh/ ; \105scp /root/.ssh/config 10.10.101.54:/root/.ssh/ ; \106scp /root/.ssh/id_rsa_mha 10.10.101.53:/root/.ssh/ ; \107scp /root/.ssh/id_rsa_mha 10.10.101.54:/root/.ssh/ ; \108scp /etc/hosts 10.10.101.53:/etc/ ; \109scp /etc/hosts 10.10.101.54:/etc/110111112scp /root/.ssh/config 10.10.101.53:/root/.ssh/ ; \113scp /root/.ssh/config 10.10.101.54:/root/.ssh/ ; \114scp /root/.ssh/id_rsa_mha 10.10.101.53:/root/.ssh/ ; \115scp /root/.ssh/id_rsa_mha 10.10.101.54:/root/.ssh/ ; \116scp /etc/hosts 10.10.101.53:/etc/ ; \117scp /etc/hosts 10.10.101.54:/etc/118119120scp /root/.ssh/config 10.10.101.41:/root/.ssh/ ; \121scp /root/.ssh/id_rsa_mha 10.10.101.41:/root/.ssh/ ; \122scp /etc/hosts 10.10.101.41:/etc/123124125126127scp .ssh/id_rsa_mha .ssh/config 10.10.101.41:/root/.ssh/128scp .ssh/id_rsa_mha .ssh/config 10.10.101.53:/root/.ssh/129scp .ssh/id_rsa_mha .ssh/config 10.10.101.54:/root/.ssh/130131132133134scp .ssh/id_rsa_mha .ssh/config 10.10.101.41:/root/.ssh/135scp .ssh/id_rsa_mha .ssh/config 10.10.101.53:/root/.ssh/136scp .ssh/id_rsa_mha .ssh/config 10.10.101.54:/root/.ssh/137138139140141scp /root/.ssh/config 10.10.101.53:/root/.ssh/142scp /root/.ssh/config 10.10.101.54:/root/.ssh/143144145scp /etc/hosts 10.10.101.53:/etc/146scp /etc/hosts 10.10.101.54:/etc/
xxxxxxxxxx11/root/.ssh/id_rsa (0x56049f23c1a0), explicit #
因为修改主机名会导致服务不能正常停止,可参考管理文档shutdown.md
xxxxxxxxxx341# 如果/etc/masterha/app1.conf中的节点参数hostname使用了ip,则不需要修改/etc/hosts;如果其中使用了主机名称,则需要添加/etc/hosts23cat >> /etc/hosts << EOF4172.16.8.6 mhaslave5172.16.8.5 mhamaster6172.23.9.47 mhamanager7EOF8cat /etc/hosts910## 先在管理节点上添加,即可同步11cat >> /root/.ssh/config << EOF12Host *13Hostname %h14User root15Port 2216StrictHostKeyChecking no17IdentityFile /root/.ssh/id_rsa_mha18EOF19history -r20212223# 远程修改主机名24ssh mhaslave "hostnamectl set-hostname mhaslave"25ssh mhamaster "hostnamectl set-hostname mhamaster"26ssh mhamanager "hostnamectl set-hostname mhamanager"2728# 本地修改2930hostnamectl set-hostname mhaslave31hostnamectl set-hostname mhamaster32hostnamectl set-hostname mhamanager3334
参考Percona\innobackupex.md
过程:
xxxxxxxxxx61use mysql;2create user 'rpl'@'172.16.16.%' identified by 'Rpl@mysql46';3grant replication slave on *.* to 'rpl'@'172.16.16.%';4flush privileges;56
xxxxxxxxxx111另外创建一个管理用户,类似于root,但登陆权限设置在局域网网段,区分repl用户2create user 'rpl'@'10.10.101.%' identified by 'MySQL_10.10.Repl';3grant replication slave on *.* to 'rpl'@'10.10.101.%';4flush privileges;5reset master;67###### 临时用8create user 'rpl'@'20.99.46.%' identified by 'Rpl@my123';9grant replication slave on *.* to 'rpl'@'20.99.46.%';10flush privileges;11reset master;
xxxxxxxxxx31reset master;2reset slave ;3reset slave all;
x
1create user 'rpl'@'172.16.8.%' identified by 'yourmysqlpassword';2grant replication slave on *.* to 'rpl'@'172.16.8.%';3flush privileges;4reset master;56create user 'rpl'@'100.64.10.%' identified by 'Rpl_pass';7grant replication slave on *.* to 'rpl'@'100.64.10.%';8flush privileges;9reset master;1011
xxxxxxxxxx11
xtrabackup
xxxxxxxxxx21scp -r /data/incr_mysqlbackup/mysql/WEEK_44 mhaslave:/data/2
xxxxxxxxxx121# 从percona备份中找到binlog_position 中的 filename 'bin.000009', position '353722478'2binlog_pos = filename 'bin.000009', position '353722478', GTID of the last change '15a672e4-00c9-11eb-b410-fa163e97fbaa:1-1073652 11 ,34567# xtrabackup_info 中有log信息点,详见percana备份记录文件-percona/innobackupxe.md8change master to master_host='172.16.16.44', master_user='rpl', master_password='Rpl@mysql46',master_port=36436,master_log_file='bin.000009',master_log_pos=353722478;9start slave;10show slave status \G;1112
从MHA Failover的过程中可以了解到,MHA Manager在恢复(补齐)其他Slave数据时会用到relay-log,因此这些relay-log需要被保留。
而默认情况下,SQL线程在回放完毕后,MySQL会主动删除relay-log,需要禁用该功能,确保relay-log不被自动删除。
在所有Slave节点中配置如下参数,然后重启mysql服务即可。
xxxxxxxxxx121# 所有的Slave节点[mysqld]2# 关闭relay-log主动删除的功能3relay_log_purge = 0456show global variables like '%relay_log_purge%';7+-----------------+-------+8| Variable_name | Value |9+-----------------+-------+10| relay_log_purge | ON |11+-----------------+-------+12set global relay_log_purge=off;
需要在节点服务器执行
xxxxxxxxxx81ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog2ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql3ln -s /usr/local/bin/ifconfig /sbin/ifconfig45cp master_ip_failover /usr/local/bin/master_ip_failover6chmod a+x /usr/local/bin/master_ip_failover78#ifconfig是为了不去修改perl脚本。而/sbin又是/usr/sbin的链接,实际到了/usr/sbin/ifconfig
xxxxxxxxxx31# 配置vip2ifconfig eth0:88 172.16.16.200/243ip a
xxxxxxxxxx21/usr/local/bin/masterha_check_ssh --conf=/etc/masterha/app1.conf2/usr/local/bin/masterha_check_repl --conf=/etc/masterha/app1.conf
xxxxxxxxxx131[root@mhamanager mha]# rpm -ql mha4mysql-manager2/usr/bin/masterha_check_repl3/usr/bin/masterha_check_ssh4/usr/bin/masterha_check_status5/usr/bin/masterha_conf_host6/usr/bin/masterha_manager7/usr/bin/masterha_master_monitor8/usr/bin/masterha_master_switch9/usr/bin/masterha_secondary_check10/usr/bin/masterha_stop11...12帮助文件13...
x
1[server default]2# 这两个参数需要根据不同的集群进行修改3manager_workdir=/var/log/masterha/app14manager_log=/var/log/masterha/app1/manager.log5# 按照master服务器存放binlog的实际路径进行修改,主要为了让MHA拉取binlog6master_binlog_dir=/data/mysql_data/7# 设置自动failover的脚本8master_ip_failover_script= /usr/local/bin/master_ip_failover 9#master_ip_failover_script= /usr/local/bin/master_ip_failover --interface=eth0 --key=88 --vip=10.10.101.55/24 --gateway=10.10.101.254 如果使用自定义参数,需要在脚本中提前定义。10# 设置手动切换时候的脚本 (供(masterha_master_switch使用)11master_ip_online_change_script=/usr/local/bin/master_ip_failover12log_level=debug13# 监控的用户14user=root15# 监控用户的密码16password=yourmysqlpassword17# 监控主库的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover18#ping_interval=319ping_interval=99999920# 检测方式是insert,MHA-0.56开始支持insert21# 会在Master中生成一个 infra 数据库22ping_type=SELECT23# 设置远端mysql在发生切换时binlog的保存位置24remote_workdir=/tmp25# 复制用的密码26repl_password=yourrepomysqlpassword27# 复制的用户28repl_user=rpl29# 告警脚本,可自行修改,这里没有使用30#report_script=/usr/local/send_report31# 通过从机进行二次探测的脚本, IP地址按照实际的情况进行修改32secondary_check_script=/usr/local/bin/masterha_secondary_check -s slave1 --user=root --master_host=master --master_port=37389 --port=2233# 设置故障发生后关闭故障主机的脚本(主要作用是关闭主机防止发生脑裂,这里没有使用,类似Fence功能)34#shutdown_script="/usr/local/bin/power_manager --command=stopssh2 --host=test-1 --ssh_user=root"35# 定义ssh的用户36ssh_user=root3738[server1]39# 这个hostname也可以配置成IP地址,同 ip 参数一样40# 如果这里写名字,需要DNS配合,或者使用 /etc/hosts41hostname=mhamaster42ip=10.10.101.5343port=3738944ssh_port=2245# candidate_master参数的意思为:设置为候选Master,如果发生主从切换,该主机会被提升为Master,即使这个服务器上的数据不是最新的(会用relay-log补全)46candidate_master=147[server2]48hostname=mhaslave49ip=10.10.101.5450port=3738951ssh_port=2252candidate_master=153# check_repl_delay参数的意思为:默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;54# 因为对于这个slave的恢复需要花费很长时间;55# 通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时;56# 这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master57check_repl_delay=05859[binlog1]60no_master=161hostname=mhamaster62master_binlog_dir=/data/mysql_data/63# 将binlog指定到服务时,数据目录会复制一份到此服务器路径。或者指定任意服务器。
xxxxxxxxxx912Thu Oct 15 21:53:33 2020 - [info] /usr/local/bin/master_ip_failover --interface=eth0 --key=88 --vip=20.99.46.60/24 --gateway=20.99.46.1 --command=status --ssh_user=root --orig_master_host=20.99.46.37 --orig_master_ip=20.99.46.37 --orig_master_port=33063Unknown option: interface4Unknown option: key5Unknown option: vip6Unknown option: gateway78此处的报错:需要将参数去掉。参考需要在 /usr/local/bin/master_ip_failover中自定义添加,以让perl脚本识别。9
/etc/supervisor.conf
mha启动调整为通过supervisord监控实现。
注意:将引用/usr/bin/masterha_manager,注意修改路径,否则启动不了。
xxxxxxxxxx241[unix_http_server]2file=/var/run/supervisor.sock ; (the path to the socket file)3[supervisord]4logfile=/var/log/supervisord.log ; (main log file;default $CWD/supervisord.log)5logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)6logfile_backups=10 ; (num of main logfile rotation backups;default 10)7loglevel=info ; (log level;default info; others: debug,warn,trace)8pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)9nodaemon=false ; (start in foreground if true;default false)10minfds=1024 ; (min. avail startup file descriptors;default 1024)11minprocs=200 ; (min. avail process descriptors;default 200)12[rpcinterface:supervisor]13supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface14[supervisorctl]15serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket16[program:mha_manager]17command=nohup /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover process_name=%(program_name)s18autostart=true19autorestart=false20redirect_stderr=true21stdout_logfile=/var/log/masterha/app1/manager.log22stdout_logfile_maxbytes=10MB23stdout_logfile_backups=524user=rootxxxxxxxxxx381[unix_http_server]2file=/var/run/supervisor.sock ; (the path to the socket file)3[supervisord]4logfile=/var/log/supervisord.log ; (main log file;default $CWD/supervisord.log)5logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)6logfile_backups=10 ; (num of main logfile rotation backups;default 10)7loglevel=info ; (log level;default info; others: debug,warn,trace)8pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)9nodaemon=false ; (start in foreground if true;default false)10minfds=1024 ; (min. avail startup file descriptors;default 1024)11minprocs=200 ; (min. avail process descriptors;default 200)12[rpcinterface:supervisor]13supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface14[supervisorctl]15serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket161718[program:binlog_backup_app1]19command=/root/binlog_backup_app1.sh20autostart=true21autorestart=true22redirect_stderr=true23stdout_logfile=/data/binlog/app1/dump.log24stdout_logfile_maxbytes=10MB25stdout_logfile_backups=526user=root2728[program:mha_manager_app1]29command=nohup /usr/local/bin/masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover process_name=%(program_name)s30autostart=true31autorestart=false32redirect_stderr=true33stdout_logfile=/var/log/masterha/app1/manager.log34stdout_logfile_maxbytes=10MB35stdout_logfile_backups=536user=root37(END)38x
123BACKUP_BIN=/usr/bin/mysqlbinlog4LOCAL_BACKUP_DIR=/data/binlog/app15BACKUP_LOG=/data/binlog/app1/dump.log6REMOTE_HOST=master7REMOTE_PORT=33068SERVER_ID=3249REMOTE_USER=root10REMOTE_PASS=yourmysqlpassword11#time to wait before reconnecting after failure12SLEEP_SECONDS=1013count=11415## 检查备份目录16if [[ ! -d ${LOCAL_BACKUP_DIR} ]];then17 mkdir -p ${LOCAL_BACKUP_DIR}18fi1920cd ${LOCAL_BACKUP_DIR}2122## 运行while循环,连接断开后等待指定时间,重新连接23while [ $count -lt 50 ]2425do2627 FIRST_BINLOG=$(mysql --host=${REMOTE_HOST} --user=${REMOTE_USER} --password=${REMOTE_PASS} -e 'show binary logs'|grep -v "Log_name"|awk '{print $1}'|head -n 1)2829 if [[ ! -d ${LOCAL_BACKUP_DIR} ]];then30 mkdir -p ${LOCAL_BACKUP_DIR}31 fi3233 cd ${LOCAL_BACKUP_DIR}3435 if [ `ls -A "${LOCAL_BACKUP_DIR}" |grep -v "dump.log" |wc -l` -eq 0 ];then36 LAST_FILE=${FIRST_BINLOG} ##如果备份目录中没有备份文件则 LAST_FILE=FIRST_FILE37 else38 LAST_FILE=`ls -l ${LOCAL_BACKUP_DIR} |grep -v "dump.log" |tail -n 1 |awk '{print $9}'` ##last_file取序列最大的binlog文件39 fi4041 date >> ${BACKUP_LOG}42 echo "${BACKUP_BIN} -R --raw --host=${REMOTE_HOST} --user=${REMOTE_USER} --password=${REMOTE_PASS} ${LAST_FILE} --stop-never --stop-never-slave-server-id=${SERVER_ID}" >> ${BACKUP_LOG}434445 ${BACKUP_BIN} -R --raw --host=${REMOTE_HOST} --user=${REMOTE_USER} --password=${REMOTE_PASS} ${LAST_FILE} --stop-never --stop-never-slave-server-id=${SERVER_ID}46 echo "`date +"%Y/%m/%d %H:%M:%S"` mysqlbinlog停止,返回代码:$?" >> ${BACKUP_LOG}47 echo "${SLEEP_SECONDS}秒后再次连接并继续备份" >> ${BACKUP_LOG}48 sleep ${SLEEP_SECONDS}49 count=`expr $count + 1`5051done5253echo "exit............." >> ${BACKUP_LOG}54
xxxxxxxxxx81# systemctl enable --now supervisord2# systemctl status supervisord34● supervisord.service - Process Monitoring and Control Daemon5 Loaded: loaded (/usr/lib/systemd/system/supervisord.service; disabled; vendor preset: disabled)6 Active: inactive (dead)7[root@mhamanamger ~]# vi /usr/lib/systemd/system/supervisord.service8### systemctl enable supervisordmha_manager的工作是实现主从的切换,但是完成mysql的主从切换之后,进程mha_manager自动退出,将不再监控。如果只是2个节点(1主一从)倒是也没必要留着这个进程,但如果用在3个及以上节点时,再次的切换主从将无法实现,故引入了supervisord服务来使进程mha_manager重生。注:supervisord是一个监控nohub的很好的方案,不仅可以用在mha_manager的后台执行上。
supervisord
是服务进程
supervisordctl
是进程管理命令(执行后,exit退出),就像systemd一样,可以将加入到/etc/supervisord.conf 主配置文件中的进程进行实时的start、stop、restart、status操作。管理方式 是操作+服务名
xxxxxxxxxx261[root@mha ~]# /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf2#服务进程,不过,如果是rpm安装的话,应该有相应的systemd,而不用这个方式。34Unlinking stale socket /var/run/supervisor.sock5[root@mha ~]# supervisorctl -c /etc/supervisord.conf6mha_manager RUNNING pid 192544, uptime 0:00:287supervisor> status8mha_manager RUNNING pid 192544, uptime 0:00:339supervisor> exit10[root@mha ~]# supervisorctl -c /etc/supervisord.conf # 进入管理状态,11mha_manager EXITED Jun 26 05:26 PM12supervisor> start mha_manager13mha_manager: started14supervisor> exit15[root@mha ~]# supervisorctl -c /etc/supervisord.conf # 进入管理状态16mha_manager RUNNING pid 194459, uptime 0:08:3017supervisor> restart mha_manager18mha_manager: stopped19mha_manager: started20supervisor> restart mha_manager21mha_manager: ERROR (not running)22mha_manager: started23supervisor>24supervisor> status25mha_manager RUNNING pid 196263, uptime 0:00:1426supervisor> exit
supervisord只是一个脚本后台自动化监控系统,本质上是调用mha4mysql工具中的/usr/bin/masterha_manager,所以真正的日志是/usr/bin/masterha_manager输出到日志中的。
即为配置文件中的
xxxxxxxxxx11stdout_logfile=/var/log/masterha/app1/manager.log
由于mha_health_check需要检测此日志文件,所有一定要查看日志输出
xxxxxxxxxx41Sun Nov 1 11:52:06 2020 - [debug] Set short wait_timeout on master: 1999998 seconds2Sun Nov 1 11:52:06 2020 - [debug] Trying to get advisory lock..3Sun Nov 1 11:52:06 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..4
xxxxxxxxxx101[Unit]2Description=Process Monitoring and Control Daemon3After=rc-local.service nss-user-lookup.target45[Service]6Type=forking7ExecStart=/usr/bin/supervisord -c /etc/supervisord.conf89[Install]10WantedBy=multi-user.target
x1[root@mhamanamger ~]# supervisord -c /etc/supervisord.conf 启动后台服务2[root@mhamanamger ~]# supervisorctl -c /etc/supervisord.conf3mha_manager FATAL Exited too quickly (process log may have details)4supervisor> start5Error: start requires a process name6start <name> Start a process7start <gname>:* Start all processes in a group8start <name> <name> Start multiple processes or groups9start all Start all processes10supervisor> start all11mha_manager: ERROR (spawn error) # 推测由于命令中的路径不存在,12supervisor> exit13[root@mhamanamger ~]# whereis masterha_manager14masterha_manager: /usr/bin/masterha_manager /usr/share/man/man1/masterha_manager.1.gz15[root@mhamanamger ~]# ln -s /usr/bin/masterha_manager /usr/local/bin/masterha_manager 创建一个快捷方式16[root@mhamanamger ~]# supervisorctl -c /etc/supervisord.conf17mha_manager FATAL Exited too quickly (process log may have details)18supervisor> start all19mha_manager: started20supervisor>21
问题处理
[root@mhamanamger ~]# supervisorctl -c /etc/supervisord.conf 由于是默认路径,可以直接用命令不加参数 unix:///var/run/supervisor.sock no such file # 是因为服务未启动,服务启动与监控的进程是否有效是无关的。 supervisor> status unix:///var/run/supervisor.sock no such file
检查项目涉及supervisor/mha状态。
修改脚本 中的master节点名称,即可。
x10 23 * * * /usr/bin/sh /root/mha_health_check.sh > /dev/null 2>&123(crontab -l ; echo "# Only Use for Check MHA status")| crontab -4(crontab -l ; echo "0 23 * * * /usr/bin/sh /root/mha_health_check.sh > /dev/null 2>&1")| crontab -5crontab -l
/root/mha_health_check.sh
SSH_USER="root" SSH_PORT="22" MYSQL_USER="root" MYSQL_PASSWORD="yourmysqlpassword" # root密码 MYSQL_PORT="3506" MASTER_HOST="master" #hostname mha_conf="app1"
x1#!/bin/bash23###############################################4# 1. 检查MHA Manager 托管服务5# 2. 检查当前MHA 集群状态6# 3. 检查主从数据库目标库上数据同步状况(默认不开启,后续完善)7###############################################8##9## Filename : MHA_health_check.sh10## Date : 2018-0711## Author : xuty12## Desc : mha health check13## Version : MySQL 5.7.20141516source /etc/profile1718##19## ========== global var ============20##2122LOG="/root/mha_health.log"23DATE=`date +%Y%m%d_%H%M%S`24SSH_USER="root"25SSH_PORT="22"26MYSQL_USER="root"27MYSQL_PASSWORD="yourmysqlpassword"28MYSQL_PORT="3506"29MASTER_HOST="master"30mha_conf="app1"3132##33## ========== function =============34##3536function supervisord_check() {37supervisord_status=`ps aux |grep supervisord |grep -v grep |wc -l`38if [[ $supervisord_status -eq 1 ]];then39echo "*** supervisord_check : [Success]" >> $LOG40else41echo "*** supervisord_check : [Failed]" >> $LOG42exit 143fi44write_line45}4647function masterha_check() {48managerlog=`tail -n 1 /var/log/masterha/${mha_conf}/manager.log|grep succeeded|wc -l`49if [[ $managerlog -eq 1 ]];then50echo "*** masterha_${mha_conf}_check : [Success]" >> $LOG51else52echo "*** masterha_${mha_conf}_check : [Failed]" >> $LOG53exit 154fi5556write_line5758}5960function replication_status_check() {6162ssh -p$SSH_PORT $MASTER_HOST "mysqlrplshow --master=root:${MYSQL_PASSWORD}@${MASTER_HOST}:${MYSQL_PORT} --discover-slaves-login=${MYSQL_USER}:${MYSQL_PASSWORD} --verbose" |tail -n 6 >> $LOG6364write_line65}666768function replication_data_check() {6970ssh -P$SSH_PORT $MASTER_HOST "/usr/local/mysql/bin/mysql -u$MYSQL_USER -p$MYSQL_PASSWORD -P$MYSQL_PORT -e 'drop table IF EXISTS percona.checksums'"7172ssh -P$SSH_PORT $MASTER_HOST "pt-table-checksum --nocheck-replication-filters --no-check-binlog-format --databases=$MYSQL_CHECK_DB h=$MASTER_USER,u=$MYSQL_USER,p=$MYSQL_PASSWORD -S $MYSQL_SOCKET"7374if [[ $? -ne 0 && $? -ne 16 ]];then75echo "*** pt-table-checksum run Failed,Please Check!" >> $LOG76exit 177fi78data_diff=`ssh -P$SSH_PORT $SLAVE_HOST "/usr/local/mysql/bin/mysql -u$MYSQL_USER -p$MYSQL_PASSWORD -P$MYSQL_PORT -e 'select db,tbl,ts from percona.checksums where master_cnt <> this_cnt OR master_crc <> this_crc OR79ISNULL(master_crc) <> ISNULL(this_crc)'"`8081if [[ $? -ne 0 ]];then82echo "*** data_diff get Failed,Please Check! " >> $LOG83exit 184fi8586if [[ `echo "data_diff"|grep ERROR |wc -l` -gt 1 ]];then87echo "*** replcation_data_check : [Failed] " >> $LOG88echo " $data_diff " >> $LOG89exit 1;90fi9192if [[ `echo "$data_diff" |wc -l` -gt 1 ]];then93echo "*** replcation_data_check : [Failed] " >> $LOG94echo "*** Failed Failed Failed! ! ! PLease Check replication_data" >> $LOG95echo " $data_diff " >> $LOG96exit 197else98echo "*** replcation_data_check : [Success]" >> $LOG99echo "*** Check Complete OK , DATE is $DATE" >> $LOG100fi101}102103104function start_log(){105106echo -e "\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> $DATE Begin Check <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<" >> $LOG107write_line108}109110function write_line(){111echo "-------------------------------------------" >> $LOG112}113114115################# main #################116117#开始检查日志记录118start_log119120#supervisord检查121supervisord_check122123#masterha检查124masterha_check125126#主从状态检查127replication_status_check128129#主从数据一致性检查130#replication_data_check
https://www.linuxidc.com/Linux/2017-09/146661.htm
在主从中,我们设置从库只读: 1、如果只是打开read_only=on 在从库上启用确保只接受来自主库的更新,不接受来自客户端的更新。但是不能保证连到从库具有super权限误写数据。 2、mysql 5.7.8开始支持super_read_only参数,如果super_read_only=on,在从库直接受来自主库的更新,连接到从库的含有super用户权限也不能更新,确保从库不被写如异常数据。
如果设置了super_read_only =on ,那么默认的read_only 也设置为on,如果再设置super_read_only =off,此时read_only 还是on,如果主从角色发生变化注意read_only也设置为off。
日常维护: 主库可读写 从库只读
从库只读的两种选择:以下参数可放入my.cnf
1、read_only = on; 2、read_only = on; super_read_only = on;
区别是如果从库也设置了super_read_only=on,可很好确保了从库不被误写数据,即便是变更时候,不小心在从库执行了super权限用户语句也会失败,不会造成主从不一致。
那么我们的从库是否一定要设置为super_read_only = on 吗?
从库重启后维护: 如果传统主从架构,遇到宿主机宕机虚拟机漂移后启动数据库,还是从库本身是物理机异常宕机,还是从库计划内升级系统补丁等操作重启后建议显示把从库设置为super_read_only=on只读。 如果是分布式,建议启动管理agent由管理的agent负责把从节点加入主库并且把从库设置为read_only=on。
其他问题: 如果使用nbu备份,备份策略是从库,由于备份用户需要super权限,需要在备份时候先把super_read_only=off备份结束后设置super_read_only=on。
xxxxxxxxxx311mysql> set global read_only=0;2Query OK, 0 rows affected (0.00 sec)34mysql> show global variables like "%read_only%";5+-----------------------+-------+6| Variable_name | Value |7+-----------------------+-------+8| innodb_read_only | OFF |9| read_only | OFF |10| super_read_only | OFF |11| transaction_read_only | OFF |12| tx_read_only | OFF |13+-----------------------+-------+145 rows in set (0.00 sec)1516mysql> set global super_read_only=on; # 此时read_only super_read_only均开启。17Query OK, 0 rows affected (0.00 sec)1819mysql> show global variables like "%read_only%";20+-----------------------+-------+21| Variable_name | Value |22+-----------------------+-------+23| innodb_read_only | OFF |24| read_only | ON | # 此时read_only super_read_only均开启。25| super_read_only | ON |26| transaction_read_only | OFF |27| tx_read_only | OFF |28+-----------------------+-------+295 rows in set (0.00 sec)3031在mysql 5.7中,只对innodb引擎的库表启作用,在8.0版本中也对MysqlIAM起作用。属于更底层的限制。
https://dev.mysql.com/doc/refman/5.7/en/innodb-read-only-instance.html
启动server 在read-only模式。对于分布在数据库应用或者数据设置为只读介质。
也可以用于数据仓库共享相同的数据目录在多个实例之间。
作用层面:事务
参数 tx_read_only 或者 transaction_read_only 用于设置事务的访问模式,可设置为 OFF/ON,默认值为 OFF,表示事务可读,可写,设置为 ON 表示事务只读,不可写。
transaction_read_only 参数在 5.7.20 版本引入,tx_read_only 参数在 8.0.3 版本被移除,这两个参数意义完全一样,只是名称不同,transaction_read_only 名称更加规范,在高版本 MySQL 中,建议使用 transaction_read_only。
该参数可以在全局范围内设置,也可以在 session 级设置,在全局范围内设置该参数后,对于已有的连接并不会生效,因为已有连接的 session 级参数仍然保持原样,因此需要杀掉已有连接,让应用重新建立连接,以便使该参数对所有连接生效。
xxxxxxxxxx21SET GLOBAL transaction_read_only = ON;2SET SESSION transaction_read_only = ON;
如果设置 transaction_read_only 为 ON,此时向表中写入数据,会产生报错,如下: ERROR 1792 (25006): Cannot execute statement in a READ ONLY transaction.
xxxxxxxxxx31read_only = on2super_read_only = on3tx_read_only = on
xxxxxxxxxx251flush tables with read lock;2set global read_only = on;3set global super_read_only = on;4set global tx_read_only = on;5show global variables like "%read_only%";678# 全锁效果9mysql> show global variables like "%read_only%";10+-----------------------+-------+11| Variable_name | Value |12+-----------------------+-------+13| innodb_read_only | OFF |14| read_only | ON |15| super_read_only | ON |16| transaction_read_only | ON |17| tx_read_only | ON |18+-----------------------+-------+195 rows in set (0.00 sec)20# 执行写入时会有jdbc报错。21The MySQL server is running with the --super-read-only option so it cannot execute this statement22# 表明--super-read-only在起作用232425xxxxxxxxxx171unlock tables;2set global read_only = off;3set global super_read_only = off;;4set global tx_read_only = off;567mysql> show global variables like "%read_only%";8+-----------------------+-------+9| Variable_name | Value |10+-----------------------+-------+11| innodb_read_only | OFF |12| read_only | ON |13| super_read_only | OFF |14| transaction_read_only | OFF |15| tx_read_only | OFF |16+-----------------------+-------+175 rows in set (0.00 sec)
xxxxxxxxxx71Stop slave;2Set @@SESSION.GTID_NEXT='564637ab-deab-11ea-8ae1-fa163e1b04fd:79';3Begin;Commit;Set @@SESSION.GTID_NEXT = AUTOMATIC;4Start slave;5show slave status \G;67# 处理时建议锁库,锁表后,备份重做主从。
报错
xxxxxxxxxx11Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'aa3f944d-bc78-11ea-b893-0cda411dd9fe:4056319' at master log bin.000004, end_log_pos 490372649. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
如下
xxxxxxxxxx6912mysql> start slave;3Query OK, 0 rows affected (0.01 sec)45mysql> show slave status \G;6*************************** 1. row ***************************7 Slave_IO_State: Waiting for master to send event8 Master_Host: 172.16.10.459 Master_User: rpl10 Master_Port: 330611 Connect_Retry: 6012 Master_Log_File: bin.00000613 Read_Master_Log_Pos: 34716514 Relay_Log_File: relay.00002315 Relay_Log_Pos: 340816 Relay_Master_Log_File: bin.00000417 Slave_IO_Running: Yes18 Slave_SQL_Running: No19 Replicate_Do_DB: 20 Replicate_Ignore_DB: 21 Replicate_Do_Table: 22 Replicate_Ignore_Table: 23 Replicate_Wild_Do_Table: 24 Replicate_Wild_Ignore_Table: 25 Last_Errno: 103226 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'aa3f944d-bc78-11ea-b893-0cda411dd9fe:4056319' at master log bin.000004, end_log_pos 490372649. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.27 Skip_Counter: 028 Exec_Master_Log_Pos: 49036691429 Relay_Log_Space: 84201990230 Until_Condition: None31 Until_Log_File: 32 Until_Log_Pos: 033 Master_SSL_Allowed: No34 Master_SSL_CA_File: 35 Master_SSL_CA_Path: 36 Master_SSL_Cert: 37 Master_SSL_Cipher: 38 Master_SSL_Key: 39 Seconds_Behind_Master: NULL40Master_SSL_Verify_Server_Cert: No41 Last_IO_Errno: 042 Last_IO_Error: 43 Last_SQL_Errno: 103244 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'aa3f944d-bc78-11ea-b893-0cda411dd9fe:4056319' at master log bin.000004, end_log_pos 490372649. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.45 Replicate_Ignore_Server_Ids: 46 Master_Server_Id: 4547 Master_UUID: aa3f944d-bc78-11ea-b893-0cda411dd9fe48 Master_Info_File: mysql.slave_master_info49 SQL_Delay: 050 SQL_Remaining_Delay: NULL51 Slave_SQL_Running_State: 52 Master_Retry_Count: 8640053 Master_Bind: 54 Last_IO_Error_Timestamp: 55 Last_SQL_Error_Timestamp: 200804 12:44:2556 Master_SSL_Crl: 57 Master_SSL_Crlpath: 58 Retrieved_Gtid_Set: aa3f944d-bc78-11ea-b893-0cda411dd9fe:4055597:4055815:4056313-493334459 Executed_Gtid_Set: aa3f944d-bc78-11ea-b893-0cda411dd9fe:1-4056318,60c1efedd3-bc78-11ea-bfd7-0cda411d8317:1-361 Auto_Position: 162 Replicate_Rewrite_DB: 63 Channel_Name: 64 Master_TLS_Version: 651 row in set (0.00 sec)6667ERROR: 68No query specified69
#按提示定位到表:edc_data_3.qrtz_scheduler_state,从库表中数据与主库不一致:
xxxxxxxxxx4212mysql> select * from performance_schema.replication_applier_status_by_worker\G;3*************************** 1. row ***************************4CHANNEL_NAME:5WORKER_ID: 16THREAD_ID: NULL7SERVICE_STATE: OFF8LAST_SEEN_TRANSACTION: aa3f944d-bc78-11ea-b893-0cda411dd9fe:40563199LAST_ERROR_NUMBER: 103210LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'aa3f944d-bc78-11ea-b893-0cda411dd9fe:4056319' at master log bin.000004, end_log_pos 490372649; Could not execute Update_rows event on table edc_data_3.0.qrtz_scheduler_state; Can't find record in 'qrtz_scheduler_state', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log bin.000004, end_log_pos 49037264911LAST_ERROR_TIMESTAMP: 2020-08-04 12:44:2512*************************** 2. row ***************************13CHANNEL_NAME:14WORKER_ID: 215THREAD_ID: NULL16SERVICE_STATE: OFF17LAST_SEEN_TRANSACTION: aa3f944d-bc78-11ea-b893-0cda411dd9fe:405631518LAST_ERROR_NUMBER: 019LAST_ERROR_MESSAGE:20LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:0021*************************** 3. row ***************************22CHANNEL_NAME:23WORKER_ID: 324THREAD_ID: NULL25SERVICE_STATE: OFF26LAST_SEEN_TRANSACTION: aa3f944d-bc78-11ea-b893-0cda411dd9fe:405631627LAST_ERROR_NUMBER: 028LAST_ERROR_MESSAGE:29LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:0030*************************** 4. row ***************************31CHANNEL_NAME:32WORKER_ID: 433THREAD_ID: NULL34SERVICE_STATE: OFF35LAST_SEEN_TRANSACTION:36LAST_ERROR_NUMBER: 037LAST_ERROR_MESSAGE:38LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00394 rows in set (0.00 sec)4041ERROR:42No query specified
清空从库表,或找到不同的地方:直接修改从库或将从库表清空,将主库备份的表还原或插入到从库表中。
但此断电影响的不同步多由于数据未即时写入binlog,出现时也一般会有多个位置未未同步,如果执行几个之后,还是不断有此种新的问题出现,可以直接将主库备份,还原到从库,重建主从关系。
备份表的命令:
注意:不加-d时导出sql,加-d时导出表结构。然后将其中的表insert语句复制出来即可。
x
1命令行下具体用法如下: mysqldump -u用戶名 -p密码 -d 數據库名 表名 脚本名;231、导出數據库為dbname的表结构(其中用戶名為root,密码為dbpasswd,生成的脚本名為db.sql)4mysqldump -uroot -pdbpasswd -d dbname >db.sql;562、导出數據库為dbname某张表(test)结构7mysqldump -uroot -pdbpasswd -d dbname test>db.sql;893、导出數據库為dbname所有表结构及表數據(不加-d)10mysqldump -uroot -pdbpasswd dbname >db.sql;11124、导出數據库為dbname某张表(test)结构及表數據(不加-d)13mysqldump -uroot -pdbpasswd dbname test>db.sql;
x1mysql> truncate qrtz_scheduler_state;2Query OK, 0 rows affected (0.01 sec)34mysql> INSERT INTO `qrtz_scheduler_state` VALUES ('edc','2a23bb5f-896f-49d7-8f5a-f3297ba890a11595818572314',1596343879614,7500);5Query OK, 1 row affected (0.00 sec)67mysql> Set @@SESSION.GTID_NEXT='aa3f944d-bc78-11ea-b893-0cda411dd9fe:4056319';8Query OK, 0 rows affected (0.00 sec)910mysql> Begin;commit;Set @@SESSION.GTID_NEXT = AUTOMATIC; start slave ;show slave status \G; #多行命令11Query OK, 0 rows affected (0.00 sec)1213Query OK, 0 rows affected (0.00 sec)1415Query OK, 0 rows affected (0.00 sec)1617Query OK, 0 rows affected (0.00 sec)1819*************************** 1. row ***************************20Slave_IO_State: Waiting for master to send event21Master_Host: 172.16.10.4522Master_User: rpl23Master_Port: 330624Connect_Retry: 6025Master_Log_File: bin.00000626Read_Master_Log_Pos: 36658727Relay_Log_File: relay.00002328Relay_Log_Pos: 340829Relay_Master_Log_File: bin.00000430Slave_IO_Running: Yes31Slave_SQL_Running: Yes32Replicate_Do_DB:33...34...35Master_SSL_Crlpath:36Retrieved_Gtid_Set: aa3f944d-bc78-11ea-b893-0cda411dd9fe:4055597:4055815:4056313-493337137Executed_Gtid_Set: aa3f944d-bc78-11ea-b893-0cda411dd9fe:1-4056319,38c1efedd3-bc78-11ea-bfd7-0cda411d8317:1-539Auto_Position: 140Replicate_Rewrite_DB:41Channel_Name:42Master_TLS_Version:431 row in set (0.00 sec)4445ERROR:46No query specified4748mysql> show slave status \G; #前面的多行命令可能结果不是实时的,需要再次执行本命令。
比如是恢复的数据库,还是提示不一致时,可以直接跳过
解决mysql开启GTID主从同步出现1236错误问题
https://blog.51cto.com/hnr520/1883282
xxxxxxxxxx211SET GTID_NEXT='15b785b6-ba17-11ea-8f0a-00163e50d19a:1';2BEGIN; COMMIT;3SET GTID_NEXT="AUTOMATIC";4START SLAVE;5START SLAVE;6show slave status \G;7START SLAVE;8show slave status \G;910STOP SLAVE;11SET GTID_NEXT='15b785b6-ba17-11ea-8f0a-00163e50d19a:20';12BEGIN; COMMIT;13SET GTID_NEXT="AUTOMATIC";14START SLAVE;15show slave status\G;1617STOP SLAVE; 18SET GLOBAL sql_slave_skip_counter =1;19START SLAVE; 2021
xxxxxxxxxx258123=================从库45mysql> show slave status \G;6*************************** 1. row ***************************7 Slave_IO_State:8 Master_Host: 100.64.88.1029 Master_User: rpl10 Master_Port: 330611 Connect_Retry: 6012 Master_Log_File: bin.00001313 Read_Master_Log_Pos: 21014 Relay_Log_File: relay.00001515 Relay_Log_Pos: 416 Relay_Master_Log_File: bin.00001317 Slave_IO_Running: No18 Slave_SQL_Running: Yes19 Replicate_Do_DB:20 Replicate_Ignore_DB:21 Replicate_Do_Table:22 Replicate_Ignore_Table:23 Replicate_Wild_Do_Table:24 Replicate_Wild_Ignore_Table:25 Last_Errno: 026 Last_Error:27 Skip_Counter: 028 Exec_Master_Log_Pos: 21029 Relay_Log_Space: 93030 Until_Condition: None31 Until_Log_File:32 Until_Log_Pos: 033 Master_SSL_Allowed: No34 Master_SSL_CA_File:35 Master_SSL_CA_Path:36 Master_SSL_Cert:37 Master_SSL_Cipher:38 Master_SSL_Key:39 Seconds_Behind_Master: NULL40Master_SSL_Verify_Server_Cert: No41 Last_IO_Errno: 123642 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'43 Last_SQL_Errno: 044 Last_SQL_Error:45 Replicate_Ignore_Server_Ids:46 Master_Server_Id: 10147 Master_UUID: 704664e0-2a40-11ea-bccc-0050562a864048 Master_Info_File: mysql.slave_master_info49 SQL_Delay: 050 SQL_Remaining_Delay: NULL51 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates52 Master_Retry_Count: 8640053 Master_Bind:54 Last_IO_Error_Timestamp: 200629 09:08:3655 Last_SQL_Error_Timestamp:56 Master_SSL_Crl:57 Master_SSL_Crlpath:58 Retrieved_Gtid_Set:59 Executed_Gtid_Set: 704664e0-2a40-11ea-bccc-0050562a8640:160 Auto_Position: 161 Replicate_Rewrite_DB:62 Channel_Name:63 Master_TLS_Version:641 row in set (0.00 sec)6566ERROR:67No query specified6869mysql> show global variables like '%gtid%'\G70*************************** 1. row ***************************71Variable_name: binlog_gtid_simple_recovery72 Value: ON73*************************** 2. row ***************************74Variable_name: enforce_gtid_consistency75 Value: ON76*************************** 3. row ***************************77Variable_name: gtid_executed78 Value: 704664e0-2a40-11ea-bccc-0050562a8640:179*************************** 4. row ***************************80Variable_name: gtid_executed_compression_period81 Value: 100082*************************** 5. row ***************************83Variable_name: gtid_mode84 Value: ON85*************************** 6. row ***************************86Variable_name: gtid_owned87 Value:88*************************** 7. row ***************************89Variable_name: gtid_purged90 Value: 704664e0-2a40-11ea-bccc-0050562a8640:191*************************** 8. row ***************************92Variable_name: session_track_gtids93 Value: OFF948 rows in set (0.00 sec)9596mysql> stop slave;97Query OK, 0 rows affected (0.00 sec)9899mysql> reset slave100 -> ;101Query OK, 0 rows affected (0.00 sec)102103mysql> reset master;104Query OK, 0 rows affected (0.01 sec)105106mysql> set @@global.gtid_purged='704664e0-2a40-11ea-bccc-0050562a8640:1-925660';107Query OK, 0 rows affected (0.00 sec)108109mysql> show global variables like '%gtid%'\G110*************************** 1. row ***************************111Variable_name: binlog_gtid_simple_recovery112 Value: ON113*************************** 2. row ***************************114Variable_name: enforce_gtid_consistency115 Value: ON116*************************** 3. row ***************************117Variable_name: gtid_executed118 Value: 704664e0-2a40-11ea-bccc-0050562a8640:1-925660119*************************** 4. row ***************************120Variable_name: gtid_executed_compression_period121 Value: 1000122*************************** 5. row ***************************123Variable_name: gtid_mode124 Value: ON125*************************** 6. row ***************************126Variable_name: gtid_owned127 Value:128*************************** 7. row ***************************129Variable_name: gtid_purged130 Value: 704664e0-2a40-11ea-bccc-0050562a8640:1-925660131*************************** 8. row ***************************132Variable_name: session_track_gtids133 Value: OFF1348 rows in set (0.00 sec)135136mysql> change master to master_host='100.64.88.102', master_user='rpl', master_password='Mysql@57rpl', master_port=3306, master_auto_position=1;137Query OK, 0 rows affected, 2 warnings (0.01 sec)138139mysql> start slave;140Query OK, 0 rows affected (0.00 sec)141142mysql> show slave status \G;143*************************** 1. row ***************************144 Slave_IO_State: Waiting for master to send event145 Master_Host: 100.64.88.102146 Master_User: rpl147 Master_Port: 3306148 Connect_Retry: 60149 Master_Log_File: bin.000031150 Read_Master_Log_Pos: 194151 Relay_Log_File: relay.000002152 Relay_Log_Pos: 355153 Relay_Master_Log_File: bin.000030154 Slave_IO_Running: Yes155 Slave_SQL_Running: Yes156 Replicate_Do_DB:157 Replicate_Ignore_DB:158 Replicate_Do_Table:159 Replicate_Ignore_Table:160 Replicate_Wild_Do_Table:161 Replicate_Wild_Ignore_Table:162 Last_Errno: 0163 Last_Error:164 Skip_Counter: 0165 Exec_Master_Log_Pos: 154166 Relay_Log_Space: 1027167 Until_Condition: None168 Until_Log_File:169 Until_Log_Pos: 0170 Master_SSL_Allowed: No171 Master_SSL_CA_File:172 Master_SSL_CA_Path:173 Master_SSL_Cert:174 Master_SSL_Cipher:175 Master_SSL_Key:176 Seconds_Behind_Master: 10838317177Master_SSL_Verify_Server_Cert: No178 Last_IO_Errno: 0179 Last_IO_Error:180 Last_SQL_Errno: 0181 Last_SQL_Error:182 Replicate_Ignore_Server_Ids:183 Master_Server_Id: 101184 Master_UUID: 704664e0-2a40-11ea-bccc-0050562a8640185 Master_Info_File: mysql.slave_master_info186 SQL_Delay: 0187 SQL_Remaining_Delay: NULL188 Slave_SQL_Running_State: Waiting for slave workers to process their queues189 Master_Retry_Count: 86400190 Master_Bind:191 Last_IO_Error_Timestamp:192 Last_SQL_Error_Timestamp:193 Master_SSL_Crl:194 Master_SSL_Crlpath:195 Retrieved_Gtid_Set: 704664e0-2a40-11ea-bccc-0050562a8640:925661196 Executed_Gtid_Set: 704664e0-2a40-11ea-bccc-0050562a8640:1-925660197 Auto_Position: 1198 Replicate_Rewrite_DB:199 Channel_Name:200 Master_TLS_Version:2011 row in set (0.00 sec)202203ERROR:204No query specified205206mysql>207208209210211=================212213214mysql> stop slave;215Query OK, 0 rows affected (0.00 sec)216217mysql> set @@global.gtid_purged='704664e0-2a40-11ea-bccc-0050562a8640:1-925660';218ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.219mysql> show global variables like '%gtid%'\G220*************************** 1. row ***************************221Variable_name: binlog_gtid_simple_recovery222 Value: ON223*************************** 2. row ***************************224Variable_name: enforce_gtid_consistency225 Value: ON226*************************** 3. row ***************************227Variable_name: gtid_executed228 Value: 704664e0-2a40-11ea-bccc-0050562a8640:1-342817229*************************** 4. row ***************************230Variable_name: gtid_executed_compression_period231 Value: 1000232*************************** 5. row ***************************233Variable_name: gtid_mode234 Value: ON235*************************** 6. row ***************************236Variable_name: gtid_owned237 Value:238*************************** 7. row ***************************239Variable_name: gtid_purged240 Value: 704664e0-2a40-11ea-bccc-0050562a8640:1:12241*************************** 8. row ***************************242Variable_name: session_track_gtids243 Value: OFF2448 rows in set (0.00 sec)245246mysql> reset slave;247Query OK, 0 rows affected (0.00 sec)248249mysql> reset master250 -> ;251Query OK, 0 rows affected (0.01 sec)252253mysql> set @@global.gtid_purged='704664e0-2a40-11ea-bccc-0050562a8640:1-925660';254Query OK, 0 rows affected (0.00 sec)255256mysql> start slave;257Query OK, 0 rows affected (0.01 sec)258
x
1pt-table-checksum --user=root --password=yourmysqlpassword --host=10.10.101.54 --databases=anna --replicate=cnail.checksums --create-replicate-table --no-check-binlog-format2pt-table-checksum --user=root --password=yourmysqlpassword --host=10.10.101.54 --replicate=cnail.checksums --create-replicate-table --no-check-binlog-format3pt-table-checksum --user=root --password=Mysql_57 --host=10.10.101.54 --replicate=cnail.checksums --create-replicate-table --no-check-binlog-format45pt-table-sync --charset=utf8 --ignore-databases=mysql,sys,percona dsn=u=root,p=yourmysqlpassword,h=10.10.101.53,P=37389 dsn=u=root,p=yourmysqlpassword,h=10.10.101.54,P=37389 --execute --print67pt-table-sync --charset=utf8mb4 --ignore-databases=mysql,sys,percona dsn=u=root,p=yourmysqlpassword,h=10.10.101.53,P=37389 dsn=u=root,p=yourmysqlpassword,h=10.10.101.54,P=37389 --execute --print8
master_ip_failover
xxxxxxxxxx21my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";2my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
masterha/app1.conf
x
1[server default]2# 这两个参数需要根据不同的集群进行修改3manager_workdir=/var/log/masterha/app14manager_log=/var/log/masterha/app1/manager.log5# 按照master服务器存放binlog的实际路径进行修改,主要为了让MHA拉取binlog6master_binlog_dir=/data/mysql_data/7# 设置自动failover的脚本8master_ip_failover_script= /usr/local/bin/master_ip_failover --interface=eth0 --key=88 --vip=10.10.101.55/24 --gateway=10.10.101.2549# 设置手动切换时候的脚本 (供(masterha_master_switch使用)10master_ip_online_change_script=/usr/local/bin/master_ip_failover --interface=eth0 --key=88 --vip=10.10.101.55/24 --gateway=10.10.101.25411log_level=debug12# 监控的用户13user=root14# 监控用户的密码15password=yourmysqlpassword16# 监控主库的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover17ping_interval=318#ping_interval=99999919# 检测方式是insert,MHA-0.56开始支持insert20# 会在Master中生成一个 infra 数据库21ping_type=SELECT22# 设置远端mysql在发生切换时binlog的保存位置23remote_workdir=/tmp24# 复制用的密码25repl_password=yourrepomysqlpassword26# 复制的用户27repl_user=rpl28# 告警脚本,可自行修改,这里没有使用29#report_script=/usr/local/send_report30# 通过从机进行二次探测的脚本, IP地址按照实际的情况进行修改31secondary_check_script=/usr/local/bin/masterha_secondary_check -s slave1 --user=root --master_host=master --master_port=37389 --port=2232# 设置故障发生后关闭故障主机的脚本(主要作用是关闭主机防止发生脑裂,这里没有使用,类似Fence功能)33#shutdown_script="/usr/local/bin/power_manager --command=stopssh2 --host=test-1 --ssh_user=root"34# 定义ssh的用户35ssh_user=root3637[server1]38# 这个hostname也可以配置成IP地址,同 ip 参数一样39# 如果这里写名字,需要DNS配合,或者使用 /etc/hosts40hostname=mhamaster41ip=10.10.101.5342port=3738943ssh_port=2244# candidate_master参数的意思为:设置为候选Master,如果发生主从切换,该主机会被提升为Master,即使这个服务器上的数据不是最新的(会用relay-log补全)45candidate_master=146[server2]47hostname=mhaslave48ip=10.10.101.5449port=3738950ssh_port=2251candidate_master=152# check_repl_delay参数的意思为:默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;53# 因为对于这个slave的恢复需要花费很长时间;54# 通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时;55# 这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master56check_repl_delay=05758[binlog1]59no_master=160hostname=mhamaster61master_binlog_dir=/data/mysql_data/62
xxxxxxxxxx131mysql> show master logs;2+------------+------------+3| Log_name | File_size |4+------------+------------+5| bin.000003 | 1073744384 |6| bin.000004 | 1073744375 |7| bin.000005 | 258289828 |8| bin.000006 | 57864 |9+------------+------------+104 rows in set (0.00 sec)1112mysql> show slave status \G;13Empty set (0.00 sec)
主从同步问题处理