疲劳是最舒适的枕头,努力工作吧。

elasticsearch collector [index-stats] timed out when collecting data

Elasticsearch | 作者 zriplj | 发布于2018年01月09日 | 阅读数:20348

架构:
6台Elasticsearch
硬件配置:12C.32G.2T raid0.其中master节点是用的一台logastsh
172.16.1.10           14          98   5    0.54    0.43     0.63 mi        *      UGZ-LOGSTASH-V10
172.16.1.8            18          99   8    2.56    2.56     3.77 mdi       -      UGZ-ELASTICSEARCH-V8
172.16.1.2            38          99  10    9.42    4.57     3.51 mdi       -      UGZ-ELASTICSEARCH-V2
172.16.1.5            45          98   3    2.24    2.34     2.66 mdi       -      UGZ-ELASTICSEARCH-V5
172.16.1.1            40          99   5    7.63    3.62     3.21 mdi       -      UGZ-ELASTICSEARCH-V1
172.16.1.6            21          99   3    4.27    3.97     3.56 mdi       -      UGZ-ELASTICSEARCH-V6
2台logstash 8C8G
3台kafka 8C8G
filebeat>kafka>logstash>elasticsearc>kibana
索引默认5:1,按天分配索引
这是1月6号的索引
access_log-2018.01.06           3     r      STARTED 34209283    43gb 172.16.1.6 UGZ-ELASTICSEARCH-V6
access_log-2018.01.06           3     p      STARTED 34209283    43gb 172.16.1.5 UGZ-ELASTICSEARCH-V5
access_log-2018.01.06           4     p      STARTED 34219552    43gb 172.16.1.2 UGZ-ELASTICSEARCH-V2
access_log-2018.01.06           4     r      STARTED 34219552    43gb 172.16.1.5 UGZ-ELASTICSEARCH-V5
access_log-2018.01.06           1     r      STARTED 34218896    43gb 172.16.1.1 UGZ-ELASTICSEARCH-V1
access_log-2018.01.06           1     p      STARTED 34218896    43gb 172.16.1.8 UGZ-ELASTICSEARCH-V8
access_log-2018.01.06           2     r      STARTED 34215375    43gb 172.16.1.2 UGZ-ELASTICSEARCH-V2
access_log-2018.01.06           0     p      STARTED 34222802  43.2gb 172.16.1.1 UGZ-ELASTICSEARCH-V1
access_log-2018.01.06           0     r      STARTED 34222802  43.1gb 172.16.1.6 UGZ-ELASTICSEARCH-V6
问题现象:
UGZ-LOGSTASH-V10 中elasticsearch报的日志错误如下

[2018-01-09T14:10:06,116][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [UGZ-LOGSTASH-V10] collector [cluster_stats] timed out when collecting data
[2018-01-09T14:10:16,118][ERROR][o.e.x.m.c.m.JobStatsCollector] [UGZ-LOGSTASH-V10] collector [job_stats] timed out when collecting data
[2018-01-09T14:11:06,119][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [UGZ-LOGSTASH-V10] collector [cluster_stats] timed out when collecting data
[2018-01-09T14:11:16,120][ERROR][o.e.x.m.c.m.JobStatsCollector] [UGZ-LOGSTASH-V10] collector [job_stats] timed out when collecting data
[2018-01-09T14:11:26,123][ERROR][o.e.x.m.c.i.IndexStatsCollector] [UGZ-LOGSTASH-V10] collector [index-stats] timed out when collecting data
[2018-01-09T14:13:46,516][ERROR][o.e.x.m.c.i.IndexStatsCollector] [UGZ-LOGSTASH-V10] collector [index-stats] timed out when collecting data
[2018-01-09T14:25:06,144][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [UGZ-LOGSTASH-V10] collector [cluster_stats] timed out when collecting data
[2018-01-09T14:25:15,874][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [UGZ-LOGSTASH-V10] failed to execute on node [_lnaVHJxQi2bCN0v79Fu3Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [UGZ-ELASTICSEARCH-V8][172.16.1.8:9300][cluster:monitor/nodes/stats[n]] request_id [356962] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:951) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-01-09T14:25:16,146][ERROR][o.e.x.m.c.m.JobStatsCollector] [UGZ-LOGSTASH-V10] collector [job_stats] timed out when collecting data
[2018-01-09T14:25:26,148][ERROR][o.e.x.m.c.i.IndexStatsCollector] [UGZ-LOGSTASH-V10] collector [index-stats] timed out when collecting data
[2018-01-09T14:25:36,152][ERROR][o.e.x.m.c.i.IndicesStatsCollector] [UGZ-LOGSTASH-V10] collector [indices-stats] timed out when collecting data
[2018-01-09T14:25:46,153][ERROR][o.e.x.m.c.i.IndexRecoveryCollector] [UGZ-LOGSTASH-V10] collector [index-recovery] timed out when collecting data
[2018-01-09T14:25:49,324][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [48450ms] ago, timed out [33450ms] ago, action [cluster:monitor/nodes/stats[n]], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [356962]
[2018-01-09T14:41:39,338][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [UGZ-LOGSTASH-V10] failed to execute on node [_lnaVHJxQi2bCN0v79Fu3Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [UGZ-ELASTICSEARCH-V8][172.16.1.8:9300][cluster:monitor/nodes/stats[n]] request_id [361981] timed out after [15001ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:951) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-01-09T14:42:39,340][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [UGZ-LOGSTASH-V10] failed to execute on node [_lnaVHJxQi2bCN0v79Fu3Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [UGZ-ELASTICSEARCH-V8][172.16.1.8:9300][cluster:monitor/nodes/stats[n]] request_id [362015] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:951) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-01-09T14:43:39,341][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [UGZ-LOGSTASH-V10] failed to execute on node [_lnaVHJxQi2bCN0v79Fu3Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [UGZ-ELASTICSEARCH-V8][172.16.1.8:9300][cluster:monitor/nodes/stats[n]] request_id [362051] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:951) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-01-09T14:44:39,343][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [UGZ-LOGSTASH-V10] failed to execute on node [_lnaVHJxQi2bCN0v79Fu3Q]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [UGZ-ELASTICSEARCH-V8][172.16.1.8:9300][cluster:monitor/nodes/stats[n]] request_id [362087] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:951) [elasticsearch-5.5.1.jar:5.5.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.5.1.jar:5.5.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2018-01-09T14:45:16,530][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [52187ms] ago, timed out [37187ms] ago, action [cluster:monitor/nodes/stats[n]], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [362087]
[2018-01-09T14:45:16,530][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [232193ms] ago, timed out [217192ms] ago, action [cluster:monitor/nodes/stats[n]], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [361981]
[2018-01-09T14:45:16,530][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [246537ms] ago, timed out [146537ms] ago, action [internal:discovery/zen/fd/ping], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [361973]
[2018-01-09T14:45:16,530][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [146537ms] ago, timed out [46536ms] ago, action [internal:discovery/zen/fd/ping], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [362035]
[2018-01-09T14:45:16,534][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [172195ms] ago, timed out [157195ms] ago, action [cluster:monitor/nodes/stats[n]], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [362015]
[2018-01-09T14:45:16,535][WARN ][o.e.t.TransportService   ] [UGZ-LOGSTASH-V10] Received response for a request that has timed out, sent [112194ms] ago, timed out [97194ms] ago, action [cluster:monitor/nodes/stats[n]], node [{UGZ-ELASTICSEARCH-V8}{_lnaVHJxQi2bCN0v79Fu3Q}{pgcOiRRsSvqmIkKWiJZ0TA}{172.16.1.8}{172.16.1.8:9300}{ml.enabled=true}], id [362051]
[2018-01-09T15:15:06,252][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [UGZ-LOGSTASH-V10] collector [cluster_stats] timed out when collecting data

然后logastsh的报错如下:
[2018-01-09T15:10:21,552][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@172.16.1.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-01-09T15:10:21,552][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-01-09T15:10:21,554][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.8:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@172.16.1.8:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.8:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-01-09T15:10:21,555][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.8:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-01-09T15:10:21,645][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.5:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@172.16.1.5:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.5:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-01-09T15:10:21,645][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.5:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-01-09T15:10:21,705][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@172.16.1.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-01-09T15:10:21,705][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@172.16.1.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-01-09T15:10:23,236][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,236][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,291][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,291][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,457][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,457][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,460][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,460][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,466][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,466][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,477][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,477][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,479][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,479][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,505][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,505][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,558][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,558][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,560][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,560][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,655][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,655][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:23,713][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-01-09T15:10:23,714][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-01-09T15:10:25,302][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@172.16.1.1:9200/, :path=>"/"}
[2018-01-09T15:10:25,307][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0x3e23a394>}
[2018-01-09T15:10:25,307][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@172.16.1.2:9200/, :path=>"/"}
[2018-01-09T15:10:25,310][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0x63558bce>}
[2018-01-09T15:10:25,310][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@172.16.1.5:9200/, :path=>"/"}
[2018-01-09T15:10:25,315][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0x68de8c51>}
[2018-01-09T15:10:25,315][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@172.16.1.6:9200/, :path=>"/"}
[2018-01-09T15:10:25,319][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0x295c30db>}
[2018-01-09T15:10:25,319][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@172.16.1.8:9200/, :path=>"/"}
[2018-01-09T15:10:25,324][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<Java::JavaNet::URI:0x144ab00f>}

 
已邀请:

kennywu76 - Wood

赞同来自: exceptions

这台机器的iowait虽然非常高,但r/s和w/s都是0,一般来说是磁盘故障了。 我们遇到过类似的问题,是因为SSD寿命到了。

xinfanwang

赞同来自:

有个节点接近挂了,基本上没响应了。检查节点状态吧。IO,jvm 内存状态。

zriplj

赞同来自:

@xinfanwang 感谢回复
见我上传的图片。出现这种超时的问题,CPU/IO都不是很高

xinfanwang

赞同来自:

从你提供的情况来看,节点已经不响应了。不管是master的请求(通过transport端口),还是logstash请求(http端口)。这个时刻monitor同样收集不到当时的状态数据。所以这个监控截图的状态中间是缺失的。
1, 检查节点的log。
2. 检查当时系统的状态,top看CPU/IO。
3. jvm的thread dump,多做几次,看看线程的状态。
 
我碰到的这种情况是由于数据涌入,节点做index操作压力太大。仅供参考。

zriplj

赞同来自:

@xinfanwang
1、data数据节点的log无异常,见图
2、logstash报错显示v6超时。检查了系统状态,io与cpu在v6这机器上确实高,但为什么负载会只存在于v6这台机器上呢。集群环境负载不平衡了。

 

xinfanwang

赞同来自:

这台机器的IOWait太高了!ES基本上会挂死在IO上。需要单独测一下机器IO是否正常。如果负载不平均,调整logstash的output,这个反而容易。

zriplj

赞同来自:

如何调整logstash的output让集群负载平衡?

zriplj

赞同来自:

@xinfanwang  @kennywu76
请问下这是何原因,为什么负载这么不平衡。二台机器 的IO与CPU差别太大了,现在ELK直接没法使用

zriplj

赞同来自:

@kennywu76
ES是分别单独一台存在于kvm主机中的。
通过观察kafkalag的图形显示,每次都是阻塞了半小时左右的时间。磁盘IO开始写入(见下图),然后kafka也开始消费,非常的不稳定。如果说是系统层面有问题 ,不应该是这种抽疯式的吧。是不是ELK哪里要调忧?

我的es配置文件
cluster.name: UGZ_ELK_CLUSTER
node.master: false
node.data: true
node.name: UGZ-ELASTICSEARCH-V5
path.data: /home/elasticsearch/data
path.logs: /home/elasticsearch/logs
network.host: 172.16.1.5
http.port: 9200
discovery.zen.ping.unicast.hosts: ["172.16.1.1:9300", "172.16.1.2:9300","172.16.1.5:9300","172.16.1.6:9300","172.16.1.8:9300"]
discovery.zen.fd.ping_timeout: 100s
discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_retries: 10
discovery.zen.minimum_master_nodes: 3
bootstrap.memory_lock: true
indices.fielddata.cache.size: 20%
bootstrap.system_call_filter: false
thread_pool.bulk.queue_size: 1000

medcl - 今晚打老虎。

赞同来自:

只有一个 master 么?那一台是 mster 节点呢?
看 Kibana 监控,es 集群有很明显的 GC 造成的停顿现象,服务器有独立的性能监控么?
v6的负载为什么高这么多,肯定有问题。

zriplj

赞同来自:

@medcl 之前是只做了一台机器配置了master true ,上面的配置是6台机器都有可能成为master的啊。现在就是负载经常跑到集群中的某一台机器上面。见下图
 

zriplj

赞同来自:

@medcl 大佬 ,我已经把集群架构改成如下图的结构了,你看CPU负载都在V5这一台

master节点报的ERROR日志
[2018-01-15T12:18:48,384][ERROR][o.e.x.m.c.i.IndexStatsCollector] [UGZ-ELASTICSEARCH-Master09] collector [index-stats] timed out when collecting data
[2018-01-15T12:19:43,706][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [UGZ-ELASTICSEARCH-Master09] collector [cluster_stats] timed out when collecting data
[2018-01-15T12:20:14,631][ERROR][o.e.x.m.c.i.IndexStatsCollector] [UGZ-ELASTICSEARCH-Master09] collector [index-stats] timed out when collecting data

yaogang732

赞同来自:

大佬问题解决了吗? 同样遇到 求告诉

jingpeiyang

赞同来自:

同样的问题,跪求解决方案。elasticsearch版本5.5.0

zriplj

赞同来自:

最终问题确实是KVM虚拟机磁盘上的问题

dingxiaocao

赞同来自:

老哥解决了嘛,我也遇到这种问题了,突然不能写入了 看监控也是正常的 查询变的很慢 版本6.2.4

要回复问题请先登录注册