三人行必有我师

indices.breaker.request.limit熔断时的栈

Elasticsearch | 作者 novia | 发布于2017年09月19日 | 阅读数:7212

RemoteTransportException[[node_xxx.xxx.xxx][xxx.xxx.xxx.xxx:xxxx][indices:data/read/search[phase/query]]]; nested: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] wo
uld be larger than limit of [5569511424/5.1gb]];
Caused by: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] would be larger than limit of [5569511424/5.1gb]]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:97)
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:148)
at org.elasticsearch.common.util.BigArrays.adjustBreaker(BigArrays.java:396)
at org.elasticsearch.common.util.BigArrays.validate(BigArrays.java:433)
at org.elasticsearch.common.util.BigArrays.newDoubleArray(BigArrays.java:640)
at org.elasticsearch.search.aggregations.metrics.sum.SumAggregator.<init>(SumAggregator.java:58)
at org.elasticsearch.search.aggregations.metrics.sum.SumAggregator$Factory.doCreateInternal(SumAggregator.java:124)
at org.elasticsearch.search.aggregations.metrics.sum.SumAggregator$Factory.doCreateInternal(SumAggregator.java:108)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:76)
at org.elasticsearch.search.aggregations.AggregatorBase.<init>(AggregatorBase.java:69)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:48)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:40)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:79)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.<init>(GlobalOrdinalsStringTermsAggregator.java:275)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$3.create(TermsAggregatorFactory.java:92)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:243)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactory.asMultiBucketAggregator(AggregatorFactory.java:119)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:196)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:76)
at org.elasticsearch.search.aggregations.AggregatorBase.<init>(AggregatorBase.java:69)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:48)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:40)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:79)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.<init>(GlobalOrdinalsStringTermsAggregator.java:275)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$3.create(TermsAggregatorFactory.java:92)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:243)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactory.asMultiBucketAggregator(AggregatorFactory.java:119)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:196)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:76)
at org.elasticsearch.search.aggregations.AggregatorBase.<init>(AggregatorBase.java:69)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:48)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:40)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:79)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.<init>(GlobalOrdinalsStringTermsAggregator.java:275)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$3.create(TermsAggregatorFactory.java:92)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:243)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactories.createSubAggregators(AggregatorFactories.java:76)
at org.elasticsearch.search.aggregations.AggregatorBase.<init>(AggregatorBase.java:69)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:48)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:141)
at org.elasticsearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:40)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:79)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:72)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:243)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:87)
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:78)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:104)
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:363)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:375)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:368)
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:365)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

已邀请:

kennywu76 - Wood

赞同来自: novia famoss

@novia 
 
我看了下代码,ES里很多计算,包括terms aggregation,都是通过BigArrays这个工具类申请大数组来存放计算数据的。 不管申请什么类型数据(long,int,byte, etc..)的数组,都会在实际分配内存之前,调用adjustBreaker()方法,将新申请数组的大小交给breaker service来判断是否应该熔断。  

Screen_Shot_2017-09-19_at_14.35_.13_.png

 
这个方法内部会调用
breaker.addEstimateBytesAndMaybeBreak, 传入新申请的数组字节大小,作为一个增量值delta计入总的request的总字节数里面,然后判断是否超过request breaker设置的上限。 如果超过则触发一个"<reused_arrays>"类型的异常,也就是你贴的日志里看到的错误信息:

Screen_Shot_2017-09-19_at_14.36_.04_.png

 
breaker service对于一个node来说是全局的,也就是说这个结点上所有申请的bigarray的内存消耗会进行累计。
 
另外这些申请的大数组,都是继承自AbstractArray这个抽象类,这个类是releasable的,其close()方法里会调用
bigArrays.adjustBreaker将释放的内存从breaker的记录里减去。

AbstractArray.jpg


 
因此breaker是以所有进行中的查询聚合生成持有的bigarrays占用的内存为基础,再将新的内存申请累加上去,判断是否应该触发熔断掉,从而尽量避免内存消耗过大导致结点挂掉。  
 
根据你描述的有一段时间一直熔断的特征,要么是当时一直有大内存消耗的聚合请求在进来,要么是有大聚合耗时非常长,导致申请的bigarrays长时间被占用没有释放,并且接近request breaker临界值,致使新进来的聚合不断的被熔断。重启就好了,可能只是因为释放掉了这些进行中的高消耗查询。
 
建议下次再遇到问题时,用hot threads api看一下,高消耗的线程在做什么。
 

PS.   我看的代码是5.6的, 不排除早期版本熔断存在一些不完善的地方。

kennywu76 - Wood

赞同来自:

导致熔断的是外层一个terms aggs,内部嵌套了一个sum aggs。 应该是外层terms aggs的字段唯一值(cardinality)过多,导致生成的bucket过多所致。 查一下原始的查询是怎样写的,聚合的字段cardinality有多高,设置的size有多大。

novia - 1&0

赞同来自:

这个报错原因确实如你所说,是进行了一个很大的aggs,但之后的查询数据量都非常小,却依然报这个错误,这个就让人很难理解,如异常之后我进行了下面的查询,数据量只有62条,依然会报此错误
 
{
"query": {
"bool": {
"must": [
{
"term": {
"keywords_code": "3574305971261990"
}
},
{
"range": {
"release_date_day": {
"lte": "2017-09-18",
"gte": "2017-09-12"
}
}
}
]
}
},
"aggs": {
"a": {
"terms": {
"field": "keywords_code"
}
}
}
}

novia - 1&0

赞同来自:

不好意思,当时的日志没有及时抓取,执行上面的测试查询后,返回结果为:4个片fail,异常为:
Caused by: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] would be larger than limit of [5569511424/5.1gb]]
栈信息应该是下面的(抱歉,当时日志没有及时抓到):
Caused by: CircuitBreakingException[[request] Data too large, data for [<reused_arrays>] would be larger than limit of [5569511424/5.1gb]]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:97)
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:148)
at org.elasticsearch.common.util.BigArrays.adjustBreaker(BigArrays.java:396)
at org.elasticsearch.common.util.BigArrays.validate(BigArrays.java:433)
at org.elasticsearch.common.util.BigArrays.newLongArray(BigArrays.java:590)
at org.elasticsearch.common.util.LongHash.<init>(LongHash.java:44)
at org.elasticsearch.common.util.LongHash.<init>(LongHash.java:38)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.<init>(GlobalOrdinalsStringTermsAggregator.java:277)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$3.create(TermsAggregatorFactory.java:92)
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:243)
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:64)
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:102)
at org.elasticsearch.search.aggregations.AggregatorFactory$1$1.collect(AggregatorFactory.java:200)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:72)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash$2.collect(GlobalOrdinalsStringTermsAggregator.java:312)
at org.elasticsearch.search.aggregations.AggregatorFactory$1$1.collect(AggregatorFactory.java:208)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash$2.collect(GlobalOrdinalsStringTermsAggregator.java:310)
at org.elasticsearch.search.aggregations.AggregatorFactory$1$1.collect(AggregatorFactory.java:208)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$2.collect(GlobalOrdinalsStringTermsAggregator.java:130)
at org.elasticsearch.search.aggregations.LeafBucketCollector.collect(LeafBucketCollector.java:88)
at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:145)
at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:218)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:169)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:772)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:324)
而且还有一个现象,集群从前天开始就有如下的warn


[2017-09-18 00:01:01,315][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569513960 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking
[2017-09-18 00:01:01,321][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569513696 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking
[2017-09-18 00:01:01,322][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569513776 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking
[2017-09-18 00:01:01,323][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569514128 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking
[2017-09-18 00:01:01,324][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569513992 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking
[2017-09-18 00:01:01,327][WARN ][indices.breaker          ] [node_30.206.179] [request] New used memory 5569513784 [5.1gb] for data of [<reused_arrays>] would be larger than configured breaker: 5569511424 [5.1gb], breaking



novia - 1&0

赞同来自:

微信图片_20170919103427.png

从这监控看,确实每次请求之后会释放,但是从上面的warn日志看,es为什么会有个监控一直检测这个值呢?难道有一次查询超出后,就会出问题么

weizijun - elasticsearch fan

赞同来自:

request的breaker是记录节点实时状态内存分配的使用,如果有内存分配泄露或者程序异常,request的breaker记录就居高不下了。楼主用的监控很好啊,能推荐下吗?

要回复问题请先登录注册