用了Elasticsearch,一口气上5T

ES报错 接着集群状态就会变成,提示查询超时

Elasticsearch | 作者 nathon | 发布于2018年09月14日 | 阅读数:4962

[2018-09-14T09:04:31,557][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node-1] failed to execute on node [atQZXcLfTOS1tXvmIm3eSg]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node-1][172.16.15.19:9300][cluster:monitor/nodes/stats[n]] request_id [14172574] timed out after [15001ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:961) [elasticsearch-5.6.8.jar:5.6.8]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:575) [elasticsearch-5.6.8.jar:5.6.8]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2018-09-14T09:04:32,750][WARN ][o.e.t.TransportService ] [node-1] Received response for a request that has timed out, sent [16194ms] ago, timed out [1193ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node-1}{atQZXcLfTOS1tXvmIm3eSg}{9tFITNIlRgqmkaq4rAJ9XA}{172.16.15.19}{172.16.15.19:9300}], id [14172574]
[2018-09-14T09:04:39,256][WARN ][o.e.t.TransportService ] [node-1] Received response for a request that has timed out, sent [22700ms] ago, timed out [7699ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node-2}{XHScn61PQpyPqEdSxqUzFw}{RQwzFku4Q1ST4ZRVPyw4mA}{172.17.2.84}{172.17.2.84:9300}], id [14172575]
已邀请:

rojay - 杭州的一枚90后初入职场的IT男

赞同来自:

看日志的话,貌似ES节点出故障了,这类问题优先看下ES集群的健康状况,你这单纯的发这么一段日志具体原因也定位不出来的!

yayg2008

赞同来自:

看描述,像是某个节点故障,但是集群又还没有剔除该节点,导致整个集群响应异常。

etoak

赞同来自:

我有同样的问题 : 至今尚未解决 
[2020-04-03T11:31:42,763][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node1] failed to execute on node [qmUHhDkARuGxlWIILVF7Fg]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node1][192.168.1.220:9305][cluster:monitor/nodes/stats[n]] request_id [5451837] timed out after [15143ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:869) [?:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2020-04-03T11:31:43,147][WARN ][o.e.a.a.c.n.s.TransportNodesStatsAction] [node1] not accumulating exceptions, excluding exception from response
org.elasticsearch.action.FailedNodeException: Failed node [qmUHhDkARuGxlWIILVF7Fg]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) [?:?]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$1(TransportNodesAction.java:241) [?:?]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) [?:?]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:984) [?:?]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:868) [?:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [node1][192.168.1.220:9305][cluster:monitor/nodes/stats[n]] request_id [5451837] timed out after [15143ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:869) ~[?:?]
... 4 more
[2020-04-03T11:31:55,253][WARN ][o.e.t.TransportService   ] [node1] Received response for a request that has timed out, sent [30054ms] ago, timed out [14911ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node1}{qmUHhDkARuGxlWIILVF7Fg}{CqdpHBd9Rv-NgLEya_uCgQ}{192.168.1.220}{192.168.1.220:9305}], id [5451837]
[2020-04-03T11:32:37,533][WARN ][o.e.m.j.JvmGcMonitorService] [node1] [gc][young][323731][57884] duration [1s], collections [1]/[1.3s], total [1s]/[16.4m], memory [5.1gb]->[3.7gb]/[7.8gb], all_pools {[young] [1.4gb]->[13mb]/[1.4gb]}{[survivor] [26.8mb]->[70.4mb]/[191.3mb]}{[old] [3.6gb]->[3.6gb]/[6.1gb]}
[2020-04-03T11:32:37,533][WARN ][o.e.m.j.JvmGcMonitorService] [node1] [gc][323731] overhead, spent [1s] collecting in the last [1.3s]
[2020-04-03T11:33:27,891][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node1] failed to execute on node [qmUHhDkARuGxlWIILVF7Fg]

要回复问题请先登录注册