elasticsearch 在进行cardinality聚合时出现OOM问题

我现在对一个字段做了terms aggs,然后对每个term又进行cardinality,size设置是全量的。目前我测试的文档数30w左右,而聚合的term大概是3w,而测试数据中cardinality的clientId都是同一个数值(正常计算应该是uv:1)。。
 
环境情况:
elasticsearch version: 5.6
机器内存:6G 
JVM:2G
 
代码段:
SearchResponse sr = ElasticsearchClientRepository.transportClient().prepareSearch(indexNames).setTypes(indexType).setIndicesOptions(IndicesOptions.lenientExpandOpen())
.setQuery(builder.buildQuery())
.addAggregation(
AggregationBuilders.terms("counterId").field("counterId.keyword").collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST)
.subAggregation(AggregationBuilders.filter("pv", QueryBuilders.termQuery("traceViewType", 1)))
.subAggregation(AggregationBuilders.filter("show", QueryBuilders.termQuery("traceViewType", 2)))
.subAggregation(AggregationBuilders.filter("click", QueryBuilders.termQuery("traceViewType", 3)))
.subAggregation(AggregationBuilders.cardinality("uv").field("clientId.keyword"))
.size(99999)
).get();

 
异常信息:
[2018-05-16T15:37:54,156][TRACE][o.e.i.b.request          ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [2.8mb], limit: 1267571097 [1.1gb], estimate: 3029352 [2.8mb]]
[2018-05-16T15:37:54,180][TRACE][o.e.i.b.request ] [request] Adding [32kb][<reused_arrays>] to used bytes [new used: [2.9mb], limit: 1267571097 [1.1gb], estimate: 3062120 [2.9mb]]
[2018-05-16T15:37:54,180][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [2.9mb], limit: 1267571097 [1.1gb], estimate: 3078504 [2.9mb]]
[2018-05-16T15:37:54,180][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [2.9mb], limit: 1267571097 [1.1gb], estimate: 3094888 [2.9mb]]
[2018-05-16T15:37:54,200][TRACE][o.e.i.b.request ] [request] Adding [32kb][<reused_arrays>] to used bytes [new used: [2.9mb], limit: 1267571097 [1.1gb], estimate: 3127656 [2.9mb]]
[2018-05-16T15:37:54,200][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [2.9mb], limit: 1267571097 [1.1gb], estimate: 3144040 [2.9mb]]
[2018-05-16T15:37:54,200][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [3mb], limit: 1267571097 [1.1gb], estimate: 3160424 [3mb]]
[2018-05-16T15:37:54,204][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3mb], limit: 1267571097 [1.1gb], estimate: 3160488 [3mb]]
[2018-05-16T15:37:54,204][TRACE][o.e.i.b.request ] [request] Adding [507.4mb][<reused_arrays>] to used bytes [new used: [510.4mb], limit: 1267571097 [1.1gb], estimate: 535214504 [510.4mb]]
[2018-05-16T15:37:54,226][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [510.4mb], limit: 1267571097 [1.1gb], estimate: 535214568 [510.4mb]]
[2018-05-16T15:37:54,227][TRACE][o.e.i.b.request ] [request] Adding [507.4mb][<reused_arrays>] to used bytes [new used: [1017.8mb], limit: 1267571097 [1.1gb], estimate: 1067317736 [1017.8mb]]
[2018-05-16T15:37:54,252][TRACE][o.e.i.b.request ] [request] Adding [32kb][<reused_arrays>] to used bytes [new used: [1017.9mb], limit: 1267571097 [1.1gb], estimate: 1067350504 [1017.9mb]]
[2018-05-16T15:37:54,252][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [1017.9mb], limit: 1267571097 [1.1gb], estimate: 1067366888 [1017.9mb]]
[2018-05-16T15:37:54,252][TRACE][o.e.i.b.request ] [request] Adding [16kb][<reused_arrays>] to used bytes [new used: [1017.9mb], limit: 1267571097 [1.1gb], estimate: 1067383272 [1017.9mb]]
[2018-05-16T15:37:54,280][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [1017.9mb], limit: 1267571097 [1.1gb], estimate: 1067383336 [1017.9mb]]
[2018-05-16T15:37:54,281][TRACE][o.e.i.b.request ] [request] Adding [507.6mb][<reused_arrays>] to used bytes [new used: [1.4gb], limit: 1267571097 [1.1gb], estimate: 1599666728 [1.4gb]]
[2018-05-16T15:37:54,281][WARN ][o.e.i.b.request ] [request] New used memory 1599666728 [1.4gb] for data of [<reused_arrays>] would be larger than configured breaker: 1267571097 [1.1gb], breaking
[2018-05-16T15:37:54,281][DEBUG][o.e.i.b.request ] [request] Data too large, data for [<reused_arrays>] would be [1599666728/1.4gb], which is larger than the limit of [1267571097/1.1gb]
[2018-05-16T15:37:54,281][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-64] bytes, now [1067383272]
[2018-05-16T15:37:54,282][TRACE][o.e.s.SearchService ] [es-node-208] Query phase failed
org.elasticsearch.search.query.QueryPhaseExecutionException: Query Failed [Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:414) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:108) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$16(IndicesService.java:1130) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1211) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:401) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1217) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1129) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:246) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.2.jar:5.5.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [request] Data too large, data for [<reused_arrays>] would be [1599666728/1.4gb], which is larger than the limit of [1267571097/1.1gb]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:98) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.limit(ChildMemoryCircuitBreaker.java:170) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:123) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.BigArrays.adjustBreaker(BigArrays.java:408) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:475) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:499) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:513) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.ensureCapacity(HyperLogLogPlusPlus.java:197) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:232) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$OrdinalsCollector.postCollect(CardinalityAggregator.java:280) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.postCollectLastCollector(CardinalityAggregator.java:120) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.getLeafCollector(CardinalityAggregator.java:111) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:149) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:148) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.AggregatorBase.getLeafCollector(AggregatorBase.java:41) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.FilterCollector.getLeafCollector(FilterCollector.java:40) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.search.query.CancellableCollector.getLeafCollector(CancellableCollector.java:61) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:659) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:388) ~[elasticsearch-5.5.2.jar:5.5.2]
... 20 more

[2018-05-16T15:38:25,512][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-node-208] fatal error in thread [elasticsearch[es-node-208][search][T#3]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:490) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.<init>(HyperLogLogPlusPlus.java:171) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.buildAggregation(CardinalityAggregator.java:145) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:239) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:129) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$16(IndicesService.java:1130) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService$$Lambda$1619/1547225178.accept(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1211) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService$$Lambda$1622/2086696259.get(Unknown Source) ~[?:?]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:401) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1217) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1129) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:246) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.2.jar:5.5.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.5.2.jar:5.5.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92] 

从TRACE日志看,在请求时,不知道为什么在申请BigArrays时突然多了几个500MB多的申请内存空间,累积不断熔断,导致堆溢出,不知道这个情况是否能通过优化解决,或者有没什么好的解决方案? 
 
类似问题:https://elasticsearch.cn/question/2040
已邀请:

kennywu76 - wood@Ctrip

赞同来自: jaehe novia

主要原因是最外层的terms aggs要求返回所有的bucket,也就是3万个,然后每个bucket内部又要计算cardinality,而机器的内存本身也不是很多,就OOM了。 
解决办法:
1. 扩内存
2. cardinality aggs有一个 "precision_threshold"参数,这个精度越高,内存消耗越高。该参数默认是3000,可以减小到100试试。
3. 如果业务需要范围最外层所有的terms bucket,内存又无法扩充。 只能是分批获取数据,参考: https://www.elastic.co/guide/e ... tions 
 

yayg2008

赞同来自:

聚合的时候,size设置为0试试。

medcl - Elastic 🇨🇳 !

赞同来自:

.size(99999) 
你 size 设置这么大,还问为什么 OOM?

要回复问题请先登录注册