Elastic7.10.0 restore定期恢复抛 data too large
Elasticsearch | 作者 Hyj_simple1 | 发布于2023年03月31日 | 阅读数:5773
ES主从集群采用snapshot and restore做定时同步恢复。
从集群restore运行一段时间后触发circuit_breaking_exception,抛 [parent] Data too large, data for [<http_request>] would be [……/7.8g], which is larger than the limit of [……/7.5]……, "status":429]
检测发现,从集群执行restore的es节点 xmx占用持续很高,节点cpu使用率偏高(40%浮动)。
采用_stats命令查看相关cache内存使用情况,均很低,可忽略不计;
_nodes/stats/breaker 分析request、fielddata、in_flight_requests、model_inference、accounting使用率都很低,只有parent占用超限。
问题1: 有什么方式可以分析es节点的实际xmx使用情况?
问题2: 整个es集群数据量很低,index个位数,总数据量大小不超过百万。 restore使用有需要什么注意的点吗?
看源码里这个HierarchyCircuitBreakerService class,貌似还是堆内存没释放……
MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed()
if (this.trackRealMemoryUsage) {
final long current = currentMemoryUsage();
return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage);
} else {
long parentEstimated = transientUsage + permanentUsage;
return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage);
}
static long realMemoryUsage() {
try {
return MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed();
} catch (IllegalArgumentException ex) {
// This exception can happen (rarely) due to a race condition in the JVM when determining usage of memory pools. We do not want // to fail requests because of this and thus return zero memory usage in this case. While we could also return the most // recently determined memory usage, we would overestimate memory usage immediately after a garbage collection event.
assert ex.getMessage().matches("committed = \\d+ should be < max = \\d+");
logger.info("Cannot determine current memory usage due to JDK-8207200.", ex);
return 0;
}
}
设置indices.breaker.total.use_real_memory: false
内存使用率即下降到正常情况了…… 近期观察着
从集群restore运行一段时间后触发circuit_breaking_exception,抛 [parent] Data too large, data for [<http_request>] would be [……/7.8g], which is larger than the limit of [……/7.5]……, "status":429]
检测发现,从集群执行restore的es节点 xmx占用持续很高,节点cpu使用率偏高(40%浮动)。
采用_stats命令查看相关cache内存使用情况,均很低,可忽略不计;
_nodes/stats/breaker 分析request、fielddata、in_flight_requests、model_inference、accounting使用率都很低,只有parent占用超限。
问题1: 有什么方式可以分析es节点的实际xmx使用情况?
问题2: 整个es集群数据量很低,index个位数,总数据量大小不超过百万。 restore使用有需要什么注意的点吗?
看源码里这个HierarchyCircuitBreakerService class,貌似还是堆内存没释放……
MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed()
if (this.trackRealMemoryUsage) {
final long current = currentMemoryUsage();
return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage);
} else {
long parentEstimated = transientUsage + permanentUsage;
return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage);
}
static long realMemoryUsage() {
try {
return MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed();
} catch (IllegalArgumentException ex) {
// This exception can happen (rarely) due to a race condition in the JVM when determining usage of memory pools. We do not want // to fail requests because of this and thus return zero memory usage in this case. While we could also return the most // recently determined memory usage, we would overestimate memory usage immediately after a garbage collection event.
assert ex.getMessage().matches("committed = \\d+ should be < max = \\d+");
logger.info("Cannot determine current memory usage due to JDK-8207200.", ex);
return 0;
}
}
设置indices.breaker.total.use_real_memory: false
内存使用率即下降到正常情况了…… 近期观察着
0 个回复