Hello,World

shard UNASSIGNED 强制分片时失败

Elasticsearch | 作者 yaohe | 发布于2019年12月13日 | 阅读数:3073

执行强制分片命令:
curl -H "Content-Type:application/json" -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d  '{"commands" : [{"allocate_replica" : {"index" : "wa_pk_wb.log_201906","shard":51,"node":"1553096497000058309"}}]}'
报错如下:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[1553096497000059109][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [wa_pk_wb.log_201906][51] on node {1553096497000058309}{diwcFeFXSmGMTIVL87nesA}{wnmvPC68T-Wzop5LtMgvXQ}{1.18.48.42}{1.18.48.42:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2,} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-10-04T00:23:47.137Z], failed_attempts[5], delayed=false, details[failed shard on node [Avu6FgVqQkeD9w7JRQkIMA]: failed recovery, failure RecoveryFailedException[[wa_pk_wb.log_201906][51]: Recovery failed from {1566460253000005411}{XRrOUyhWQsqjd-TgWjesjA}{qI1jiSlKSpOk7UZrdeCtvQ}{1.18.48.6}{1.18.48.6:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2, ip=1.18.48.6} into {1568959492000072011}{Avu6FgVqQkeD9w7JRQkIMA}{TJFPQYofT42CPbg-Zzxaxw}{1.18.48.31}{1.18.48.31:9302}{rack=rack_1, xpack.installed=true, set=2, ip=1.18.48.31, temperature=hot, region=2}]; nested: RemoteTransportException[[1566460253000005411][1.18.48.6:9302][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.4.3] since this is equal-or-newer than the primary version [6.4.3])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [2.8tb], shard size: [0b], free after allocating shard: [2.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(node meets all awareness attribute requirements)]"},"status":400}
现在集群有150个分片UNASSIGNED  强制分片都是报以上错误
请指教
 
已邀请:

Ombres

赞同来自: yaohe

1.18.48.6:9302 触发熔断了,内存不足了吧?
 
匿名用户

匿名用户

赞同来自:

Accounting requests circuit breakeredit

The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in memory that are not released when a request is completed. This includes things like the Lucene segment memory.

indices.breaker.accounting.limit
Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.
indices.breaker.accounting.overhead
A constant that all accounting estimations are multiplied with to determine a final estimation. Defaults to 1
 
我大概看懂了。
 
你们的ES 堆内存大概是8GB(7GB?)吧? 按照新版本的ES 限制,Accounting  限制了分段内存的使用上线。
当你迁移数据(分配分片)到这个节点上时,触发了,分段内存限制的断路器,导致失败。
 
你这种情况,没办法,只能加大jvm内存,或者增加断路器的限制,即可解决问题。 但是不建议这么做,你这样的话,应该加机器扩容了。
 
这都是个人的猜测,不敢保证正确。
 
 
 
匿名用户

匿名用户

赞同来自:

CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ]
 
 
不像18GB堆内存啊, 这写这 parent 最大限制是5GB,按照70%jvm内存,也就7GB,按照95%的话就更少了啊。
 
18GB,parent 应该是18*0.7=12.6GB的限制。
匿名用户

匿名用户

赞同来自:

这是我们的配置,64gb   jvm堆内存


      "breaker": {
        "request": {
          "limit": "60%",
          "type": "memory",
          "overhead": "1.0"
        },
        "total": {
          "limit": "70%"
        },
        "fielddata": {
          "type": "memory",
          "overhead": "1.03"
        },
        "type": "hierarchy"
      }
  "breaker": {
        "inflight_requests": {
          "limit": "100%",
          "overhead": "1.0"
      }
 
每一个断路器的 限制情况
"breakers": { "request": { "limit_size_in_bytes": 40458623385, "limit_size": "37.6gb", "estimated_size_in_bytes": 163840, "estimated_size": "160kb", "overhead": 1, "tripped": 0 }, "fielddata": { "limit_size_in_bytes": 16857759744, "limit_size": "15.7gb", "estimated_size_in_bytes": 952080280, "estimated_size": "907.9mb", "overhead": 1.03, "tripped": 0 }, "in_flight_requests": { "limit_size_in_bytes": 67431038976, "limit_size": "62.8gb", "estimated_size_in_bytes": 169181, "estimated_size": "165.2kb", "overhead": 1, "tripped": 0 }, "parent": { "limit_size_in_bytes": 47201727283, "limit_size": "43.9gb", "estimated_size_in_bytes": 952446069, "estimated_size": "908.3mb", "overhead": 1, "tripped": 0 } },
 
 
 
如果你们的内存是18GB 和17GB,那可能不是这个原因。

匿名用户

匿名用户

赞同来自:

断路器设置的文档,默认70%的jvm内存
 https://www.elastic.co/guide/e ... eaker

要回复问题请先登录注册