行动是治愈恐惧的良药,而犹豫、拖延将不断滋养恐惧。

es进程异常退出,es日志和系统日志都无相关信息,求助各位大大~

Elasticsearch | 作者 Joey | 发布于2018年06月22日 | 阅读数:7924

各位大大,我们跑在生产上的集群最近规模扩大后(50+节点),经常会出现es进程异常消失的情况,就是跑着跑着就没了,然后被定时任务拉起来,我们在es的日志和系统日志中都没有发现相关错误,没有OOM被系统kill的日志,es日志如下,有一些young gc,然后就掉线,被定时任务自动启动,频繁的话分片恢复时对业务有一定影响,请问该如何排查呢,谢谢!
es版本是2.4.3  OS是RedHat 6.7[2018-06-20 15:41:13,027][INFO ][monitor.jvm              ] [30.24.4.33] [gc][young][838836][170507] duration [701ms], collections [1]/[1.4s], total
[701ms]/[4.6h], memory [17.7gb]->[16.3gb]/[30.6gb], all_pools {[young] [1.8gb]->[17.4mb]/[2.7gb]}{[survivor] [357.7mb]->[357.7mb]/[357.7mb]}{[old] [1
5.5gb]->[16gb]/[27.5gb]}
[2018-06-20 15:46:13,284][WARN ][monitor.jvm              ] [30.24.4.33] [gc][young][839131][170592] duration [1.5s], collections [1]/[1.8s], total [
1.5s]/[4.6h], memory [17.6gb]->[15.9gb]/[30.6gb], all_pools {[young] [2.3gb]->[79.1mb]/[2.7gb]}{[survivor] [57.4mb]->[357.7mb]/[357.7mb]}{[old] [15.2
gb]->[15.5gb]/[27.5gb]}
[2018-06-20 15:46:15,730][INFO ][monitor.jvm              ] [30.24.4.33] [gc][young][839133][170593] duration [737ms], collections [1]/[1.4s], total
[737ms]/[4.6h], memory [17.6gb]->[16.4gb]/[30.6gb], all_pools {[young] [1.7gb]->[28.7mb]/[2.7gb]}{[survivor] [357.7mb]->[357.7mb]/[357.7mb]}{[old] [1
5.5gb]->[16gb]/[27.5gb]}
[2018-06-20 16:15:03,585][WARN ][bootstrap                ] running as ROOT user. this is a bad idea!
[2018-06-20 16:15:03,596][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: CONFIG_SECCOMP not compiled into k
ernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
[2018-06-20 16:15:19,840][INFO ][node                     ] [30.24.4.33] version[2.4.3], pid[59279], build[${build/2018-04-18T03:05:39Z]
[2018-06-20 16:15:19,840][INFO ][node                     ] [30.24.4.33] initializing ...
[2018-06-20 16:15:20,449][INFO ][plugins                  ] [30.24.4.33] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2018-06-20 16:15:20,502][INFO ][env                      ] [30.24.4.33] using [1] data paths, mounts [[/wls (/dev/mapper/VolGroup01-LVwls)]], net us
able_space [3.9tb], net total_space [5tb], spins? [no], types [ext4]
 
已邀请:

UnigroupAi - 高级Elasticsearch工程师

赞同来自: Joey

一般是GC时间过长导致,调整ping的频率、超时时间和重试次数。并且优化gc(系统层和索引层)。超时时间不建议设置过大,会导致未能及时发现节点死亡。

lemontree666

赞同来自:

es 7.4也遇到该问题,es日志正常只了打印gc日志,jstat -gcutil 显示垃圾回收时间不长,没有full gc,es进程存在,jstack pid无法打印信息,jstack -F pid 输出线程全部为blocked状态,请问怎么解决的

要回复问题请先登录注册