es query(包含大量的 agg 操作) 导致经常超过 breaker.total 的值

Elasticsearch | 作者 talon | 发布于2019年07月15日 | 阅读数：3283

分享到：QQ空间新浪微博微信 QQ好友印象笔记有道云笔记

es 集群有10个节点，每个节点分配的内存大小为 30g，每个节点的总内存大小为256G。es 集群启动后通过 kibana 观察每个 es 节点的 jvm memory 的使用情况：平均为 6%~7% 左右。

启动程序后，es 的 jvm memory 使用稳步上升，大概上升到 76% 左右，然后会回落到 20% 左右，然后又重复上面的节奏，但是有的节点就会报出异常：

查询的 index 的情况：1个备份，5个分片

query 语句因为业务需要包含了20多个 agg，有的 agg 还有多层的 nested agg。通过 kibana 查出 queryCache，fielddataCache，requestCache 占用的内存都不多，都在200M 以内。

查看 es 的 gc 日志：

目前设定的 breakers:
parent: 70%
fielddata: 40%
request: 60%

目前已做过的调整和性能方面的调查：
0、设定了 bootstrap.memory_locak: true
1、在 agg 中涉及到 term agg 的地方使用过 map 和默认的 global orinale，两者效果在本程序中相差不大。
2、采集过其中一个 es 节点的 jvm heap dump，有 23G 左右大小，然后使用 eclipse mat 工具分析，在使用 mat 工具分析生成3个报告的时候有报错（且在 mat 的官网上找出报错是一个 bug ，至今 bug 未解决），导致生成的报告中显示，加起来可能内存泄露的的大小只有3G左右，很难分析出到底是哪一个对象占据的 heap 很大。在3G左右的对象中，占比较多的是 netty相关的，查阅资料设定了各个节点参数：
-Dio.netty.allocator.type=unpooled
-Dio.netty.recycler.maxCapacityPerThread=0
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
但是设定过之后，还是会报出同样的错误。
dump 包太大，导出到本地直接用插件分析也不现实。
3、使用 jmap -histo:live 命令查过在运行时各对象的使用内存的情况，较大的是一个 byte 数组，然后是 DirectByteBuffer

实在是不知道问题出在哪里，请教大神有什么好的办法或者思路没有？

附上涉及到的 DSL (这个是涉及到的几个DSL中的一个，但是这个看耗时是比较长的一个，跑一天的数据有时候要1分钟左右):
DSL太长本来想弄成附件的，但是附件没有支持文本文件。。。

GET md_detail/_search

{

	"size": 0,

	"query": {

		"bool": {

			"filter": [{

				"term": {

					"d_type": {

						"value": "d"

					}

				}

			}, {

				"term": {

					"app_id": {

						"value": "1"

					}

				}

			}, {

				"range": {

					"part": {

						"gte": 0,

						"lte": 99999

					}

				}

			}, {

				"range": {

					"date_from": {

						"lt": "2019-03-08",

						"format": "yyyy-MM-dd"

					}

				}

			}]

		}

	},

	"aggregations": {

		"person_term": {

			"terms": {

				"field": "person_id",

				"size": 2147483647,

				"execution_hint": "map",

				"order": [{

					"_count": "desc"

				}, {

					"_key": "asc"

				}],

				"collect_mode": "breadth_first"

			},

			"aggregations": {

				"p_id": {

					"nested": {

						"path": "ps"

					},

					"aggregations": {

						"nested_p_id": {

							"terms": {

								"field": "ps.c_id",

								"size": 2147483647,

								"execution_hint": "map",

								"order": [{

									"_count": "desc"

								}, {

									"_key": "asc"

								}],

								"collect_mode": "breadth_first"

							},

							"aggregations": {

								"reversed_page": {

									"reverse_nested": {},

									"aggregations": {

										"from_last_prd": {

											"max": {

												"field": "date_from",

												"format": "yyyy-MM-dd"

											}

										},

										"last_15_date": {

											"filter": {

												"range": {

													"date_from": {

														"from": "2019-02-22",

														"to": "2019-03-07",

														"format": "yyyy-MM-dd"

													}

												}

											},

											"aggregations": {

												"date_from": {

													"terms": {

														"field": "d_desc",

														"size": 14,

														"execution_hint": "map",

														"order": [{

															"_count": "desc"

														}, {

															"_key": "asc"

														}],

														"collect_mode": "breadth_first"

													}

												}

											}

										},

										"date_30": {

											"filter": {

												"range": {

													"date_from": {

														"gte": "2019-02-07",

														"lte": "2019-03-07",

														"format": "yyyy-MM-dd"

													}

												}

											},

											"aggregations": {

												"day_count": {

													"terms": {

														"field": "date_from",

														"format": "yyyy-MM-dd",

														"size": 29,

														"execution_hint": "map",

														"order": [{

															"_count": "desc"

														}, {

															"_key": "asc"

														}],

														"collect_mode": "breadth_first"

													}

												}

											}

										}

									}

								}

							}

						},

						"nested_p_c": {

							"nested": {

								"path": "ps.cs"

							},

							"aggregations": {

								"term_c_id": {

									"terms": {

										"field": "ps.cs.c_id",

										"size": 2147483647,

										"execution_hint": "map",

										"order": [{

											"_count": "desc"

										}, {

											"_key": "asc"

										}],

										"collect_mode": "breadth_first"

									},

									"aggregations": {

										"revsered_c_id": {

											"reverse_nested": {},

											"aggregations": {

												"from_last_prd": {

													"max": {

														"field": "date_from",

														"format": "yyyy-MM-dd"

													}

												},

												"last_15_date": {

													"filter": {

														"range": {

															"date_from": {

																"gte": "2019-02-22",

																"lte": "2019-03-07",

																"format": "yyyy-MM-dd"

															}

														}

													},

													"aggregations": {

														"date_from": {

															"terms": {

																"field": "d_desc",

																"size": 14,

																"execution_hint": "map",

																"order": [{

																	"_count": "desc"

																}, {

																	"_key": "asc"

																}],

																"collect_mode": "breadth_first"

															}

														}

													}

												},

												"date_30": {

													"filter": {

														"range": {

															"date_from": {

																"gte": "2019-02-07",

																"lte": "2019-03-07",

																"format": "yyyy-MM-dd"

															}

														}

													},

													"aggregations": {

														"day_count": {

															"terms": {

																"field": "date_from",

																"format": "yyyy-MM-dd",

																"size": 29,

																"execution_hint": "map",

																"order": [{

																	"_count": "desc"

																}, {

																	"_key": "asc"

																}],

																"collect_mode": "breadth_first"

															}

														}

													}

												}

											}

										}

									}

								}

							}

						}

					}

				},

				"f_id": {

					"nested": {

						"path": "fs"

					},

					"aggregations": {

						"nested_f_id": {

							"terms": {

								"field": "fs.f_id",

								"size": 2147483647,

								"execution_hint": "map",

								"order": [{

									"_count": "desc"

								}, {

									"_key": "asc"

								}],

								"collect_mode": "breadth_first"

							},

							"aggregations": {

								"reversed_f": {

									"reverse_nested": {},

									"aggregations": {

										"from_last_prd": {

											"max": {

												"field": "date_from",

												"format": "yyyy-MM-dd"

											}

										},

										"last_15_date": {

											"filter": {

												"range": {

													"date_from": {

														"gte": "2019-02-22",

														"lte": "2019-03-07",

														"format": "yyyy-MM-dd"

													}

												}

											},

											"aggregations": {

												"date_from": {

													"terms": {

														"field": "d_desc",

														"size": 14,

														"execution_hint": "map",

														"order": [{

															"_count": "desc"

														}, {

															"_key": "asc"

														}],

														"collect_mode": "breadth_first"

													}

												}

											}

										},

										"date_30": {

											"filter": {

												"range": {

													"date_from": {

														"gte": "2019-02-07",

														"lte": "2019-03-07",

														"format": "yyyy-MM-dd"

													}

												}

											},

											"aggregations": {

												"day_count": {

													"terms": {

														"field": "date_from",

														"format": "yyyy-MM-dd",

														"size": 29,

														"execution_hint": "map",

														"order": [{

															"_count": "desc"

														}, {

															"_key": "asc"

														}],

														"collect_mode": "breadth_first"

													}

												}

											}

										}

									}

								}

							}

						}

					}

				},

				"from_last_prd": {

					"max": {

						"field": "date_from",

						"format": "yyyy-MM-dd"

					}

				},

				"last_15_date": {

					"filter": {

						"range": {

							"date_from": {

								"gte": "2019-02-22",

								"lte": "2019-03-07",

								"format": "yyyy-MM-dd"

							}

						}

					},

					"aggregations": {

						"date_from": {

							"terms": {

								"field": "d_desc",

								"size": 14,

								"execution_hint": "map",

								"order": [{

									"_count": "desc"

								}, {

									"_key": "asc"

								}],

								"collect_mode": "breadth_first"

							}

						}

					}

				},

				"max_install_time": {

					"max": {

						"field": "d_install_time",

						"format": "yyyy-MM-dd"

					}

				},

				"date_30": {

					"filter": {

						"range": {

							"date_from": {

								"gte": "2019-02-07",

								"lte": "2019-03-07",

								"format": "yyyy-MM-dd"

							}

						}

					},

					"aggregations": {

						"day_count": {

							"terms": {

								"field": "date_from",

								"format": "yyyy-MM-dd",

								"size": 29,

								"execution_hint": "map",

								"order": [{

									"_count": "desc"

								}, {

									"_key": "asc"

								}],

								"collect_mode": "breadth_first"

							}

						}

					}

				},

				"max_part_no": {

					"max": {

						"field": "part"

					}

				},

				"earliest_date_from": {

					"min": {

						"field": "date_from",

						"format": "yyyy-MM-dd"

					}

				},

				"top_is_loyal": {

					"top_hits": {

						"from": 0,

						"size": 1,

						"_source": {

							"includes": ["is_loyal"],

							"excludes": []

						},

						"sort": [{

							"date_from": {

								"order": "desc"

							}

						}]

					}

				}

			}

		}

	}

}

2 个回复

Ombres

query 语句因为业务需要包含了20多个 agg，有的 agg 还有多层的 nested agg。

根本原因还是你上面提到的这句吧，嵌套bucket过多可能导致这种问题。

数据量50G并不多。单节点是256G内存，给JVM分了30G，可以适当提升一下heap大小。

-----------------------------------------
除了堆内存大小，其他排查可以考虑几方面吧。
1. 可以用搜索前和搜索后用 GET _stats/ 查一下索引的状态，对比差异，比如fielddata等
2. 检查索引的mapping是否合理，用于agg的字段是否是比较合理的设置。比如是否在text分词字段上使用了agg.
3. 优化DSL

laoyang360 - 《一本书讲透Elasticsearch》作者，Elastic认证工程师 [死磕Elasitcsearch]知识星球地址：http://t.cn/RmwM3N9；微信公众号：铭毅天下; 博客：https://elastic.blog.csdn.net

复杂dsl是要害点，建议发下大家讨论下。
说下数据存储的场景。

要回复问题请先登录或注册

es query(包含大量的 agg 操作) 导致经常超过 breaker.total 的值

2 个回复

发起人

相关问题

问题状态

es query(包含大量的 agg 操作) 导致经常超过 breaker.total 的值

与内容相关的链接

2 个回复

发起人

相关问题

问题状态