Elastic7.10.0 restore定期恢复抛 data too large

Elasticsearch | 作者 Hyj_simple1 | 发布于2023年03月31日 | 阅读数：6508

ES主从集群采用snapshot and restore做定时同步恢复。

从集群restore运行一段时间后触发circuit_breaking_exception，抛 [parent] Data too large, data for [<http_request>] would be [……/7.8g], which is larger than the limit of [……/7.5]……， "status":429]

检测发现，从集群执行restore的es节点 xmx占用持续很高，节点cpu使用率偏高（40%浮动）。

采用_stats命令查看相关cache内存使用情况，均很低，可忽略不计；
_nodes/stats/breaker 分析request、fielddata、in_flight_requests、model_inference、accounting使用率都很低，只有parent占用超限。

问题1：有什么方式可以分析es节点的实际xmx使用情况？
问题2：整个es集群数据量很低，index个位数，总数据量大小不超过百万。 restore使用有需要什么注意的点吗？

看源码里这个HierarchyCircuitBreakerService class，貌似还是堆内存没释放……
MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed()

if (this.trackRealMemoryUsage) {
final long current = currentMemoryUsage();
return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage);
} else {
long parentEstimated = transientUsage + permanentUsage;
return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage);
}

static long realMemoryUsage() {
try {
return MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed();
} catch (IllegalArgumentException ex) {
// This exception can happen (rarely) due to a race condition in the JVM when determining usage of memory pools. We do not want // to fail requests because of this and thus return zero memory usage in this case. While we could also return the most // recently determined memory usage, we would overestimate memory usage immediately after a garbage collection event.
assert ex.getMessage().matches("committed = \\d+ should be < max = \\d+");
logger.info("Cannot determine current memory usage due to JDK-8207200.", ex);
return 0;
}
}

设置indices.breaker.total.use_real_memory： false
内存使用率即下降到正常情况了…… 近期观察着

0 个回复

要回复问题请先登录或注册

Elastic7.10.0 restore定期恢复抛 data too large

0 个回复

发起人

活动推荐

相关问题

问题状态

Elastic7.10.0 restore定期恢复抛 data too large

与内容相关的链接

0 个回复

发起人

活动推荐

相关问题

问题状态