hi,各位
我们有一个logstash(5.4)集群,来消费kafka中的数据,使用的是topics_pattern的方式进行配置的,因此对应很多的kafka topic。但是发现partition分配不均衡。比方说
我们logstash集群有20个logstash,单个topic对应最大的partition为16. 但是使用下来发现,一直有4个logstash没有消费任何数据。
按照我的理解,所有的topic的partition不应该是均匀的分布在全部logstash实例上吗?
																				我们有一个logstash(5.4)集群,来消费kafka中的数据,使用的是topics_pattern的方式进行配置的,因此对应很多的kafka topic。但是发现partition分配不均衡。比方说
我们logstash集群有20个logstash,单个topic对应最大的partition为16. 但是使用下来发现,一直有4个logstash没有消费任何数据。
按照我的理解,所有的topic的partition不应该是均匀的分布在全部logstash实例上吗?

 
	
1 个回复
shjdwxy
赞同来自:
消费kafka的时候,如果使用topics_pattern的方式,默认partition_assignment_strategy为Range,应该使用
partition_assignment_strategy => "org.apache.kafka.clients.consumer.RoundRobinAssignor"
具体原因参见:The round-robin partition assignor lays out all the available partitions and all the available consumer threads. It then proceeds to do a round-robin assignment from partition to consumer thread. If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts will be within a delta of exactly one across all consumer threads.) Round-robin assignment is permitted only if: (a) Every topic has the same number of streams within a consumer instance (b) The set of subscribed topics is identical for every consumer instance within the group.
Range partitioning works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumer threads in lexicographic order. We then divide the number of partitions by the total number of consumer streams (threads) to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition.