要不要再翻翻文档呢?

四个节点的ES集群,用logstash导入数据的效率很慢,在此附上详细配置,请各位大神帮我分析一下,到底瓶颈在哪里

Logstash | 作者 helloworld1128 | 发布于2018年04月13日 | 阅读数:19536

4个数据节点(node1,node2, node3, node4),一个master节点(node5)。
4个数据节点的机器配置相同, 48核心 32 G 内存, 给ES jvm 的内存 是 16G。

master节点 56核心 128G 内存, 给ES jvm 内存是 32G。
在master 节点 上部署了 logstash , jvm 32G。

logstash.yml 配置如下:
pipeline.workers: 30
pipeline.output.workers: 30
pipeline.batch.size: 2500
pipeline.batch.delay: 5
在 configs_path 目录下, 存放有 18 个 conf文件 , 每个文件对应一个存放大量 json 文件的目录。其中一个 conf 文件如下:
input {
file {
path => "path1/*/*json"
start_position => "beginning"
close_older => 300
stat_interval => 60
}
}

filter {
# Drop Elasticsearch Bulk API control lines
if ([message] =~ "{\"index") {
drop {}
}

json {
source => "message"
remove_field => "message"
}

# Extract innermost network protocol
grok {
match => {
"[layers][frame][frame_frame_protocols]" => "%{WORD:protocol}$"
}
}

date {
match => [ "timestamp", "UNIX_MS" ]
}
}

output {
elasticsearch {
hosts => "node1:9200"
index => "index-1-%{+YYYY-MM-dd}"
document_type => "pcap_file"
manage_template => false
}
}
以  logstash -f configs_path 的方式运行logstash.运行过程中,logstash会报错:
[2018-04-12T21:59:17,413][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}
[2018-04-12T21:59:17,415][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/"}
[2018-04-12T21:59:22,037][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:22,037][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:24,414][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}
[2018-04-12T21:59:24,416][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/"}
[2018-04-12T21:59:26,020][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,021][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:26,092][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,092][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,092][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:26,093][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:26,148][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,148][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:26,178][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,178][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:26,592][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:26,592][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-04-12T21:59:27,418][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}
[2018-04-12T21:59:27,421][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/"}
[2018-04-12T21:59:28,647][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node2:9200/, :path=>"/"}
[2018-04-12T21:59:28,760][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node2:9200/"}
[2018-04-12T21:59:30,411][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node2:9200/, :path=>"/"}
[2018-04-12T21:59:46,179][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node1:9200/, :error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:46,180][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:47,424][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://node1:9200/, :path=>"/"}
[2018-04-12T21:59:47,426][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://node1:9200/"}
[2018-04-12T21:59:48,296][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:48,296][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:48,311][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:48,311][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:48,325][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:48,325][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-04-12T21:59:48,329][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-04-12T21:59:48,329][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://node2:9200/, :error_message=>"Elasticsearch Unreachable: [http://node2:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}

elasticsearch 的日志如下:
[2018-04-12T17:25:02,093][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110584] overhead, spent [21.7s] collecting in the last [22.2s]
[2018-04-12T17:25:09,535][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110591] overhead, spent [284ms] collecting in the last [1s]
[2018-04-12T17:26:20,665][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110662] overhead, spent [284ms] collecting in the last [1s]
[2018-04-12T17:26:32,668][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110674] overhead, spent [265ms] collecting in the last [1s]
[2018-04-12T17:26:34,669][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110676] overhead, spent [316ms] collecting in the last [1s]
[2018-04-12T17:27:10,852][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110712] overhead, spent [298ms] collecting in the last [1s]
[2018-04-12T17:27:36,950][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110738] overhead, spent [349ms] collecting in the last [1s]
[2018-04-12T17:27:37,951][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110739] overhead, spent [295ms] collecting in the last [1s]
[2018-04-12T17:28:05,179][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110745][830] duration [20.5s], collections [1]/[22s], total [20.5s]/[27.8m], memory [14.5gb]->[10.9gb]/[15.7gb], all_pools {[young] [2gb]->[78.3mb]/[2.1gb]}{[survivor] [159.5mb]->[0b]/[274.5mb]}{[old] [12.3gb]->[10.8gb]/[13.3gb]}
[2018-04-12T17:28:05,179][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110745] overhead, spent [21.3s] collecting in the last [22s]
[2018-04-12T17:28:11,240][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110751] overhead, spent [295ms] collecting in the last [1s]
[2018-04-12T17:28:46,252][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110786] overhead, spent [299ms] collecting in the last [1s]
[2018-04-12T17:28:57,550][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110797] overhead, spent [297ms] collecting in the last [1s]
[2018-04-12T17:29:20,922][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110799][836] duration [21s], collections [1]/[22.3s], total [21s]/[28.2m], memory [14gb]->[11.1gb]/[15.7gb], all_pools {[young] [901.8mb]->[12.1mb]/[2.1gb]}{[survivor] [229.5mb]->[0b]/[274.5mb]}{[old] [12.9gb]->[11.1gb]/[13.3gb]}
[2018-04-12T17:29:20,922][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110799] overhead, spent [21.6s] collecting in the last [22.3s]
[2018-04-12T17:29:52,198][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][110810][838] duration [20.7s], collections [1]/[21.2s], total [20.7s]/[28.5m], memory [14.6gb]->[11.2gb]/[15.7gb], all_pools {[young] [1.7gb]->[39.7mb]/[2.1gb]}{[survivor] [274.5mb]->[0b]/[274.5mb]}{[old] [12.6gb]->[11.1gb]/[13.3gb]}
[2018-04-12T17:29:52,198][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][110810] overhead, spent [20.8s] collecting in the last [21.2s]
被处理的是 tshark 解析 pcap 文件生成的 json 文件,每天生成的json文件量大概在700G -- 900G 之间。在 es 预设了模版:
PUT _template/packets
{
"index_patterns": "packets-*",
"settings": {
"number_of_shards": 4,
"number_of_replicas": 0,
"refresh_interval": "600s",
"index.mapping.total_fields.limit" : 2000000,
"index.merge.scheduler.max_thread_count": 1
},
"mappings": {
"pcap_file": {
"dynamic": "true",
"properties": {
"timestamp": {
"type": "date"
},
"layers": {
"properties": {
"frame": {
"properties": {
"frame_frame_len": {
"type": "long"
},
"frame_frame_protocols": {
"type": "keyword"
}
}
},
"ip": {
"properties": {
"ip_ip_src": {
"type": "ip"
},
"ip_ip_dst": {
"type": "ip"
}
}
},
"tcp": {
"properties": {
"tcp_tcp_srcport": {
"type": "integer"
},
"tcp_tcp_dstport": {
"type": "integer"
}
}
},
"udp": {
"properties": {
"udp_udp_srcport": {
"type": "integer"
},
"udp_udp_dstport": {
"type": "integer"
}
}
}
}
}
}
}
}
}
没有使用副本分片,而且, refresh_interval 设置成了 600s。但是目前的处理速度完全跟不上生产速度。我已经尝试多次调整logstash 的 workers 和 batch_size 参数了,但还是会遇到问题。
请大家帮我分析一下,性能瓶颈在哪里,多谢各位了。
已邀请:

yayg2008

赞同来自: helloworld1128

你的ES内存泄露了,你看日志,一直在FGC,而且每次耗时都长达20多秒,而且有10g的内存无法回收。所以基本上你的ES服务处于瘫痪状态。
内存泄露原因,有可能是Bulk操作bug导致,可以读读 @kennywu76Bulk异常引发的Elasticsearch内存泄漏

zqc0512 - andy zhou

赞同来自: helloworld1128

这个是单个文件 800G么? 建议按小时 或者 分钟切割文件,  若单个文件过大,读取数据会慢,
ES 有GC 警告, 建议ES调优后测试

helloworld1128

赞同来自:

另外, 因为是网络数据,所以字段种类比较多,ES 默认好像是2000, 我创建模版时该成了200000,不知道是不是和这个有关。

jinleileiking

赞同来自:

一个shard 800g太大了。

zqc0512 - andy zhou

赞同来自:

mapping 你看看kinbana中实际有的字段数。

zqc0512 - andy zhou

赞同来自:

start_position => "beginning"  有这个,好像是每次都从文件头开始读取啊,不是从新数据读取。
 

typuc - 80后IT男,乒乓球爱好者

赞同来自:

你节点少了点吧。我之前集群 每天1.5T,24个节点,每个节点16G,中间还有kafka,cpu 没你多

helloworld1128

赞同来自:

感谢各位的热心帮助,问题已解决。
我观察logstash的sincedb文件的变化,发现一个logstash实例的 input file 插件, 在path 设置的值为目录时, 一次只读该目录下的一个文件,处理完一个文件再处理下一个,即使设置 pipeline.workers 为几十的时候也是如此。 因此, 我把大量json文件平均分配到四个文件夹,开了四个 logstash 实例,分别发送到四个节点,速度大大提升了,目前的速度每秒可写入 11000 多条记录, 对应的文件大小是 2T 左右,而且 ES 节点还没有饱和, 还有进一步优化的空间,不过这个速度够用了。
再次感谢大家。

jinleileiking

赞同来自:

是的, 通过改参数优化logstash没多大用处, 多起几个logstash一般能提高性能。
我的经验是,改worker会使性能更差。

wzfxiaobai - 90后it民工

赞同来自:

请问最后怎么解决的这个问题?

要回复问题请先登录注册