四个节点的ES集群，用logstash导入数据的效率很慢，在此附上详细配置，请各位大神帮我分析一下，到底瓶颈在哪里

4个数据节点（node1,node2, node3, node4），一个master节点(node5)。
4个数据节点的机器配置相同， 48核心 32 G 内存，给ES jvm 的内存是 16G。

master节点 56核心 128G 内存，给ES jvm 内存是 32G。
在master 节点上部署了 logstash ， jvm 32G。

logstash.yml 配置如下：

pipeline.workers: 30

pipeline.output.workers: 30

pipeline.batch.size: 2500

pipeline.batch.delay: 5

在 configs_path 目录下，存放有 18 个 conf文件，每个文件对应一个存放大量 json 文件的目录。其中一个 conf 文件如下：

input {

  file {

    path => "path1/*/*json"

    start_position => "beginning"

    close_older => 300

    stat_interval => 60

  }

}



filter {

    # Drop Elasticsearch Bulk API control lines

    if ([message] =~ "{\"index") {

        drop {}

    }



    json {

        source => "message"

        remove_field => "message"

    }



    # Extract innermost network protocol

    grok {

        match => {

            "[layers][frame][frame_frame_protocols]" => "%{WORD:protocol}$"

        }

    }



    date {

        match => [ "timestamp", "UNIX_MS" ]

    }

}



output {

  elasticsearch {

    hosts => "node1:9200"

    index => "index-1-%{+YYYY-MM-dd}"

    document_type => "pcap_file"

    manage_template => false

  }

}

以 logstash -f configs_path 的方式运行logstash.运行过程中，logstash会报错：

[2018-04-12T21:59:17,413][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}

[2018-04-12T21:59:17,415][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/&quot;}

[2018-04-12T21:59:22,037][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:22,037][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:24,414][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}

[2018-04-12T21:59:24,416][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/&quot;}

[2018-04-12T21:59:26,020][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,021][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:26,092][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,092][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,092][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:26,093][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:26,148][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,148][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:26,178][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,178][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:26,592][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:26,592][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}

[2018-04-12T21:59:27,418][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}

[2018-04-12T21:59:27,421][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/&quot;}

[2018-04-12T21:59:28,647][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node2:9200/, :path=>"/"}

[2018-04-12T21:59:28,760][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node2:9200/&quot;}

[2018-04-12T21:59:30,411][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node2:9200/, :path=>"/"}

[2018-04-12T21:59:46,179][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:46,180][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:47,424][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}

[2018-04-12T21:59:47,426][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/&quot;}

[2018-04-12T21:59:48,296][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:48,296][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:48,311][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:48,311][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:48,325][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:48,325][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}

[2018-04-12T21:59:48,329][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

[2018-04-12T21:59:48,329][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

elasticsearch 的日志如下：

[2018-04-12T17:25:02,093][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110584] overhead, spent [21.7s] collecting in the last [22.2s]

[2018-04-12T17:25:09,535][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110591] overhead, spent [284ms] collecting in the last [1s]

[2018-04-12T17:26:20,665][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110662] overhead, spent [284ms] collecting in the last [1s]

[2018-04-12T17:26:32,668][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110674] overhead, spent [265ms] collecting in the last [1s]

[2018-04-12T17:26:34,669][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110676] overhead, spent [316ms] collecting in the last [1s]

[2018-04-12T17:27:10,852][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110712] overhead, spent [298ms] collecting in the last [1s]

[2018-04-12T17:27:36,950][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110738] overhead, spent [349ms] collecting in the last [1s]

[2018-04-12T17:27:37,951][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110739] overhead, spent [295ms] collecting in the last [1s]

[2018-04-12T17:28:05,179][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110745][830] duration [20.5s], collections [1]/[22s], total [20.5s]/[27.8m], memory [14.5gb]->[10.9gb]/[15.7gb], all_pools {[young] [2gb]->[78.3mb]/[2.1gb]}{[survivor] [159.5mb]->[0b]/[274.5mb]}{[old] [12.3gb]->[10.8gb]/[13.3gb]}

[2018-04-12T17:28:05,179][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110745] overhead, spent [21.3s] collecting in the last [22s]

[2018-04-12T17:28:11,240][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110751] overhead, spent [295ms] collecting in the last [1s]

[2018-04-12T17:28:46,252][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110786] overhead, spent [299ms] collecting in the last [1s]

[2018-04-12T17:28:57,550][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110797] overhead, spent [297ms] collecting in the last [1s]

[2018-04-12T17:29:20,922][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110799][836] duration [21s], collections [1]/[22.3s], total [21s]/[28.2m], memory [14gb]->[11.1gb]/[15.7gb], all_pools {[young] [901.8mb]->[12.1mb]/[2.1gb]}{[survivor] [229.5mb]->[0b]/[274.5mb]}{[old] [12.9gb]->[11.1gb]/[13.3gb]}

[2018-04-12T17:29:20,922][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110799] overhead, spent [21.6s] collecting in the last [22.3s]

[2018-04-12T17:29:52,198][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110810][838] duration [20.7s], collections [1]/[21.2s], total [20.7s]/[28.5m], memory [14.6gb]->[11.2gb]/[15.7gb], all_pools {[young] [1.7gb]->[39.7mb]/[2.1gb]}{[survivor] [274.5mb]->[0b]/[274.5mb]}{[old] [12.6gb]->[11.1gb]/[13.3gb]}

[2018-04-12T17:29:52,198][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110810] overhead, spent [20.8s] collecting in the last [21.2s]

被处理的是 tshark 解析 pcap 文件生成的 json 文件，每天生成的json文件量大概在700G -- 900G 之间。在 es 预设了模版：

PUT _template/packets

{

  "index_patterns": "packets-*",

  "settings": {

    "number_of_shards": 4, 

    "number_of_replicas": 0, 

    "refresh_interval": "600s",

    "index.mapping.total_fields.limit" : 2000000,

    "index.merge.scheduler.max_thread_count": 1

  },

  "mappings": {

    "pcap_file": {

      "dynamic": "true",

      "properties": {

        "timestamp": {

          "type": "date"

        },

        "layers": {

          "properties": {

            "frame": {

              "properties": {

                "frame_frame_len": {

                  "type": "long"

                },

                "frame_frame_protocols": {

                  "type": "keyword"

                }

              }

            },

            "ip": {

              "properties": {

                "ip_ip_src": {

                  "type": "ip"

                },

                "ip_ip_dst": {

                  "type": "ip"

                }

              }

            },

            "tcp": {

              "properties": {

                "tcp_tcp_srcport": {

                  "type": "integer"

                },

                "tcp_tcp_dstport": {

                  "type": "integer"

                }

              }

            },

            "udp": {

              "properties": {

                "udp_udp_srcport": {

                  "type": "integer"

                },

                "udp_udp_dstport": {

                  "type": "integer"

                }

              }

            }

          }

        }

      }

    }

  }

}

没有使用副本分片，而且， refresh_interval 设置成了 600s。但是目前的处理速度完全跟不上生产速度。我已经尝试多次调整logstash 的 workers 和 batch_size 参数了，但还是会遇到问题。
请大家帮我分析一下，性能瓶颈在哪里，多谢各位了。

四个节点的ES集群，用logstash导入数据的效率很慢，在此附上详细配置，请各位大神帮我分析一下，到底瓶颈在哪里

10 个回复

发起人

相关问题

问题状态

四个节点的ES集群，用logstash导入数据的效率很慢，在此附上详细配置，请各位大神帮我分析一下，到底瓶颈在哪里

与内容相关的链接

10 个回复

发起人

相关问题

问题状态