看,灰机...

filebeat->kafka数据流中,当kafka集群部分broker不可达时会发生什么?

Beats | 作者 Jiehui Tang | 发布于2019年07月03日 | 阅读数:4056

最近生产环境中遇到的一个问题,filebeat启动后能正常采集几分钟的日志,之后便不再继续采集了。
 
查看filebeat的日志文件,有不断刷出如下日志:
 


2019-05-21T18:53:31+08:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=4 filebeat.harvester.running=4 filebeat.harvester.started=4 libbeat.kafka.call_count.PublishEvents=1 libbeat.output.kafka.bytes_read=20981 libbeat.output.kafka.bytes_write=19778 libbeat.publisher.published_events=224
2019-05-21T18:53:31+08:00 WARN Failed to connect to broker 10.0.0.90:9092: dial tcp 10.0.0.90:9092: i/o timeout
2019-05-21T18:53:31+08:00 WARN producer/broker/9 state change to [closing] because dial tcp 10.0.0.90:9092: i/o timeout
2019-05-21T18:53:31+08:00 WARN producer/leader/Log_ICC_SH_LOG/7 state change to [retrying-1]
2019-05-21T18:53:31+08:00 WARN producer/leader/Log_ICC_SH_LOG/7 abandoning broker 9
2019-05-21T18:53:31+08:00 WARN producer/broker/9 shut down
2019-05-21T18:53:31+08:00 WARN client/metadata fetching metadata for [Log_ICC_SH_LOG] from broker 10.0.0.65:9092
2019-05-21T18:53:31+08:00 WARN producer/broker/9 starting up
2019-05-21T18:53:31+08:00 WARN producer/broker/9 state change to [open] on Log_ICC_SH_LOG/7
2019-05-21T18:53:31+08:00 WARN producer/leader/Log_ICC_SH_LOG/7 selected broker 9
2019-05-21T18:53:31+08:00 WARN producer/leader/Log_ICC_SH_LOG/7 state change to [flushing-1]
2019-05-21T18:53:31+08:00 WARN producer/leader/Log_ICC_SH_LOG/7 state change to [normal]
2019-05-21T18:54:01+08:00 INFO Non-zero metrics in the last 30s: libbeat.output.kafka.bytes_read=524 libbeat.output.kafka.bytes_write=39
2019-05-21T18:54:01+08:00 WARN Failed to connect to broker 10.0.0.90:9092: dial tcp 10.0.0.90:9092: i/o timeout
2019-05-21T18:54:01+08:00 WARN producer/broker/9 state change to [closing] because dial tcp 10.0.0.90:9092: i/o timeout


filebeat.yml配置如下:
---


filebeat.prospectors:
  -
    clean_inactive: 168h
    close_inactive: 1m
    close_timeout: 1h
    encoding: utf-8
    fields:
      topicName: Log_ICC_SH
    fields_under_root: true
    ignore_older: 5m
    input_type: log
    max_bytes: 307200
    paths:
      - /appl/air1apuser/log/icc/*/traces/*.log
      - /appl/air1apuser/log/icc/*/clob/*.log
      - /appl/air1apuser/log/icc/*/oms/*.log
filebeat.spool_size: 256
output.kafka:
  enabled: true
  hosts:
    - "10.0.0.61:9092"
    - "10.0.0.62:9092"
    - "10.0.0.63:9092"
    - "10.0.0.64:9092"
    - "10.0.0.65:9092"
    - "10.0.0.66:9092"
    - "10.0.0.67:9092"
    - "10.0.0.68:9092"
  topic: Log_ICC_SH_LOG
tags:
  - json


后查看filebeat配置文档,得知:

filebeat.PNG

 
当 reachable_only 为false时,一旦某个partition的leader不可达,则日志发送会被堵塞。
reachable_only的默认值就是false,因此为了避免这个问题,应该将reachable_only设置为true。
 
还未在测试环境下验证,先在这儿mark一下。
已邀请:

要回复问题请先登录注册