加入收藏 | 设为首页 | 会员中心 | 我要投稿 应用网_镇江站长网 (https://www.0511zz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 站长学院 > MySql教程 > 正文

MySQL DeadLock故障排查全过程记录

发布时间:2022-03-02 15:03:56 所属栏目:MySql教程 来源:互联网
导读:【作者】 刘博:携程技术保障中心数据库高级经理,主要关注Sql server和Mysql的运维和故障处理。 【环境】 版本号:5.6.21 隔离级别:REPEATABLE READ 【问题描述】 接到监控报警,有一个线上的应用DeadLock报错,每15分钟会准时出现 登录Mysql服务器查看
      【作者】
       刘博:携程技术保障中心数据库高级经理,主要关注Sql server和Mysql的运维和故障处理。
 
【环境】
     版本号:5.6.21
     隔离级别:REPEATABLE READ
 
【问题描述】
      接到监控报警,有一个线上的应用DeadLock报错,每15分钟会准时出现

登录Mysql服务器查看日志:
mysql> show engine innodb statusG
 
*** (1) TRANSACTION:
 
TRANSACTION 102973, ACTIVE 11 sec starting index read
 
mysql tables in use 3, locked 3
 
LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s)
 
MySQL thread id 6, OS thread handle 140024996574976, query id 83 localhost us updating
 
UPDATE TestTable
 
SET column1 = 1,
 
Column2 = sysdate(),
 
Column3= '026'
 
Column4 = 0
 
AND column5 = 485
 
AND column6 = 'SEK'
 
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_column6 of table test.TestTable trx id 102973 lock_mode X waiting
 
Record lock, heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007e1452; asc ~ R;;
 
*** (2) TRANSACTION:
 
TRANSACTION 102972, ACTIVE 26 sec starting index read
 
mysql tables in use 3, locked 3
 
219 lock struct(s), heap size 24784, 2906 row lock(s), undo log entries 7
 
MySQL thread id 5, OS thread handle 140024996841216, query id 84 localhost us updating
 
UPDATE TestTable
 
SET Column1 = 1,
 
Column2 = sysdate(),
 
Column3 = '026'
 
Column4 = 0
 
AND Column5 = 485
 
AND Column6 = 'SEK'
 
*** (2) HOLDS THE LOCK(S):
 
RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_Column6 of table test.TestTable trx id 102972 lock_mode X
 
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 
0: len 8; hex 73757072656d756d; asc supremum;;
 
 
Record lock, heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007e1452; asc ~ R;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 601 page no 89642 n bits 1000 index idx_column6 of table test.TestTable trx id 32231892482 lock_mode X locks rec but not gap waiting
 
Record lock, heap no 38 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007eea14; asc ~ ;;
大致一看,更新同一索引的同一行,应该是一个Block,报TimeOut的错才对,怎么会报DeadLock?
 
【初步分析】
先分析下(2) TRANSACTION,TRANSACTION 32231892482。
等待的锁信息为:
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007eea14; asc
持有的锁信息为:
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007eeac4; asc
再先分析下(1) TRANSACTION,TRANSACTION 32231892617。
等待的锁信息为:
0: len 3; hex 53454b; asc SEK;;
 
1: len 8; hex 80000000007eeac4; asc
于是可以画出的死锁表,两个资源相互依赖,造成死锁:
TRANSACTION Hold Wait
32231892617 53454b80000000007eea14 53454b80000000007eeac4
32231892482 53454b80000000007eeac4 53454b80000000007eea14
让我们再看一下explain结果:
mysql>desc UPDATE TestTable SET Column1=1, Column2 = sysdate(),Column3 = '025' Column4 = 0 AND Column5 = 477 AND Column6 = 'SEK' G;
 
*************************** 1. row ***************************
 
id: 1
 
select_type: UPDATE
 
table: TestTable
 
partitions: NULL
 
type: index_merge
 
possible_keys: column5_index,idx_column5_column6_Column1,idxColumn6
 
key: column5_index,idxColumn6
 
key_len: 8,9
 
ref: NULL
 
rows: 7
 
filtered: 100.00
 
Extra: Using intersect(column5_index,idxColumn6); Using where
可以看到 EXTRA 列:
Using intersect(column5_index,idxColumn6)
从5.1开始,引入了 index merge 优化技术,对同一个表可以使用多个索引分别进行条件扫描。
相关文档:http://dev.mysql.com/doc/refman/5.7/en/index-merge-optimization.html
The Index Merge method is used to retrieve rows with several range scans and to merge their results into one. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans. This access method merges index scans from a single table; it does not merge scans across multiple tables.
【模拟与验证】
根据以上初步分析,猜测应该就是intersect造成的,于是在测试环境模拟验证,开启2个session模拟死锁:
 
时间序列 Session1 Session2
1 Begin;
2 UPDATE TestTable SET Column2 = sysdate() Column4 = 0 AND Column5 = 47 AND Column6 = 'SEK
执行成功,影响7行
3 Begin;
4 UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
被Blocking
5 UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
执行成功 ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
依据以上信息可以发现Session2虽然被Block了,但也获取了一些Session1在时间序列5时所需资源的X锁,可以再开启一个查询select count(Column5) from TestTable where Column5 = 485,设置SET TRANSACTION ISOLATION LEVEL SERIALIZABLE,去查询Column5 = 485的行,观察锁等待的信息:
mysql> SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id
 
mysql> SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query, b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query FROM information_schema.innodb_lock_waits w INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id G;
 
*************************** 1. row ***************************
 
waiting_trx_id: 103006
 
waiting_thread: 36
 
waiting_query: UPDATE TestTable SET Column1 = 1, Column2 = sysdate(), Column3 = '026' Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK'
 
blocking_trx_id: 103003
 
blocking_thread: 37
 
blocking_query: NULL
 
*************************** 2. row ***************************
 
waiting_trx_id: 421500433538672
 
waiting_thread: 39
 
waiting_query: select count(Column5) from TestTable where Column5 = 485
 
blocking_trx_id: 103006
 
blocking_thread: 36
 
blocking_query: UPDATE TestTable SET Column1 = 1, Column2 = sysdate(), Column3 = '026' Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK'
 
2 rows in set, 1 warning (0.00 sec)
 
mysql> select * from information_schema.innodb_lock_waits G;
 
*************************** 1. row ***************************
 
requesting_trx_id: 103006
 
requested_lock_id: 103006:417:1493:859
 
blocking_trx_id: 103003
 
blocking_lock_id: 103003:417:1493:859
 
*************************** 2. row ***************************
 
requesting_trx_id: 421500433538672
 
requested_lock_id: 421500433538672:417:749:2
 
blocking_trx_id: 103006
 
blocking_lock_id: 103006:417:749:2
 
2 rows in set, 1 warning (0.00 sec)
mysql> select * from INNODB_LOCKS G;
 
*************************** 1. row ***************************
 
lock_id: 103006:417:1493:859
 
lock_trx_id: 103006
 
lock_mode: X
 
lock_type: RECORD
 
lock_table: test.TestTable
 
lock_index: idxColumn6
 
lock_space: 417
 
lock_page: 1493
 
lock_rec: 859
 
lock_data: 'SEK', 8262738
 
*************************** 2. row ***************************
 
lock_id: 103003:417:1493:859
 
lock_trx_id: 103003
 
lock_mode: X
 
lock_type: RECORD
 
lock_table:test.TestTable
 
lock_index: idxColumn6
 
lock_space: 417
 
lock_page: 1493
 
lock_rec: 859
 
lock_data: 'SEK', 8262738
 
*************************** 3. row ***************************
 
lock_id: 421500433538672:417:749:2
 
lock_trx_id: 421500433538672
 
lock_mode: S
 
lock_type: RECORD
 
lock_table: test.TestTable
 
lock_index: column5_index
 
lock_space: 417
 
lock_page: 749
 
lock_rec: 2
 
lock_data: 485, 8317620
 
*************************** 4. row ***************************
 
lock_id: 103006:417:749:2
 
lock_trx_id: 103006
 
lock_mode: X
 
lock_type: RECORD
 
lock_table: test.TestTable
 
lock_index: column5_index
 
lock_space: 417
 
lock_page: 749
 
lock_rec: 2
 
lock_data: 485, 8317620
 
4 rows in set, 1 warning (0.00 sec)
可以看到Session2,trx_id 103006阻塞了trx_id 421500433538672,而trx_id 421500433538672 requested_lock也正好是lock_data: 485, 8317620。由此可见Session2虽然别block了,但是还是获取到了Index column5_index相关的锁。被Block是因为intersect的原因,还需要idxColumn6的锁,至此思路已经清晰,对整个分配锁的信息简化一下,如下表格(请求到的锁用青色表示,需获取但未获取到的锁用红色表示):
 
时间点 Session1 Session2
1 477 SEK
2 485 SEK
3 485 SEK 死锁发生
可以看到485 SEK这两个资源形成了一个环状,最终发生死锁。
 
【解决方法】
最佳的方法是添加column5和Column6的联合索引。
我们环境当时的情况发现Column6的筛选度非常低,就删除了Column6的索引。

(编辑:应用网_镇江站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!