Q:有两个人掉到陷阱里了,死的人叫死人,活人叫什么?

Elastic7.10.0 restore定期恢复抛 data too large

Elasticsearch | 作者 Hyj_simple1 | 发布于2023年03月31日 | 阅读数:5773

ES主从集群采用snapshot and restore做定时同步恢复。
 
从集群restore运行一段时间后触发circuit_breaking_exception,抛 [parent] Data too large, data for [<http_request>] would be [……/7.8g], which is larger than the limit of [……/7.5]……, "status":429]
 
检测发现,从集群执行restore的es节点 xmx占用持续很高,节点cpu使用率偏高(40%浮动)。
 
 
采用_stats命令查看相关cache内存使用情况,均很低,可忽略不计;
_nodes/stats/breaker 分析request、fielddata、in_flight_requests、model_inference、accounting使用率都很低,只有parent占用超限。
 
问题1: 有什么方式可以分析es节点的实际xmx使用情况?
问题2: 整个es集群数据量很低,index个位数,总数据量大小不超过百万。 restore使用有需要什么注意的点吗?
 
 
 

看源码里这个HierarchyCircuitBreakerService class,貌似还是堆内存没释放……
MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed()


if (this.trackRealMemoryUsage) { 
 final long current = currentMemoryUsage(); 
 return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage); 
} else { 
  long parentEstimated = transientUsage + permanentUsage; 
  return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage); 
}



static long realMemoryUsage() { 
try { 
   return MEMORY_MX_BEAN.getHeapMemoryUsage().getUsed(); 
} catch (IllegalArgumentException ex) { 
    // This exception can happen (rarely) due to a race condition in the JVM when determining      usage of memory pools. We do not want // to fail requests because of this and thus return zero memory usage in this case. While we could also return the most // recently determined memory usage, we would overestimate memory usage immediately after a garbage collection event. 
      assert ex.getMessage().matches("committed = \\d+ should be < max = \\d+"); 
      logger.info("Cannot determine current memory usage due to JDK-8207200.", ex); 
      return 0; 
  }
}
 
设置indices.breaker.total.use_real_memory: false
内存使用率即下降到正常情况了……  近期观察着
已邀请:

要回复问题请先登录注册