绊脚石乃是进身之阶。

严重问题:es节点经常hung导致系统hung_task_timeout_secs

Elasticsearch | 作者 gentleman | 发布于2021年01月09日 | 阅读数:2398

一、 故障描述:

1.突然kvm机器的iowait飙高,cpu利用率很低,cpu负载很高
2.es服务不可用,es服务无法正常关闭,强制kill掉es服务会变成僵尸进程,不强制kill过一段时间也会变成僵尸进程
3.kvm机器无法正常重启,kvm所在物理机负载飙高,一段时间后物理机会重启
Jan 8 10:59:45 localhost kernel: INFO: task elasticsearch[n:8445 blocked for more than 120 seconds. ---8445时间大于120s
Jan 8 10:59:45 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 8 10:59:45 localhost kernel: elasticsearch[n D ffff9f25513e1040 0 8445 1 0x000001a0
Jan 8 10:59:45 localhost kernel: Call Trace:
Jan 8 10:59:45 localhost kernel: [<ffffffffaa941fdb>] ? kblockd_mod_delayed_work_on+0x1b/0x20
Jan 8 10:59:45 localhost kernel: [<ffffffffaad65ce0>] ? bit_wait+0x50/0x50
Jan 8 10:59:45 localhost kernel: [<ffffffffaad67bc9>] schedule+0x29/0x70
Jan 8 10:59:45 localhost kernel: [<ffffffffaad656a1>] schedule_timeout+0x221/0x2d0
Jan 8 10:59:45 localhost kernel: [<ffffffffaa66a0ce>] ? kvm_clock_get_cycles+0x1e/0x20
Jan 8 10:59:45 localhost kernel: [<ffffffffaa700f72>] ? ktime_get_ts64+0x52/0xf0
Jan 8 10:59:45 localhost kernel: [<ffffffffaad65ce0>] ? bit_wait+0x50/0x50
Jan 8 10:59:45 localhost kernel: [<ffffffffaad6726d>] io_schedule_timeout+0xad/0x130
Jan 8 10:59:45 localhost kernel: [<ffffffffaad67308>] io_schedule+0x18/0x20
已邀请:

johnies - 80后IT男

赞同来自:

可考虑调整如下内核参数,将内存脏页及时刷入磁盘,避免大IO导致响应超时。
vm.dirty_background_ratio
vm.dirty_ratio

要回复问题请先登录注册