shard UNASSIGNED 强制分片时失败

Elasticsearch | 作者 yaohe | 发布于2019年12月13日 | 阅读数：3319

执行强制分片命令：
curl -H "Content-Type:application/json" -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d '{"commands" : [{"allocate_replica" : {"index" : "wa_pk_wb.log_201906","shard":51,"node":"1553096497000058309"}}]}'
报错如下：
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[1553096497000059109][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [wa_pk_wb.log_201906][51] on node {1553096497000058309}{diwcFeFXSmGMTIVL87nesA}{wnmvPC68T-Wzop5LtMgvXQ}{1.18.48.42}{1.18.48.42:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2,} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-10-04T00:23:47.137Z], failed_attempts[5], delayed=false, details[failed shard on node [Avu6FgVqQkeD9w7JRQkIMA]: failed recovery, failure RecoveryFailedException[[wa_pk_wb.log_201906][51]: Recovery failed from {1566460253000005411}{XRrOUyhWQsqjd-TgWjesjA}{qI1jiSlKSpOk7UZrdeCtvQ}{1.18.48.6}{1.18.48.6:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2, ip=1.18.48.6} into {1568959492000072011}{Avu6FgVqQkeD9w7JRQkIMA}{TJFPQYofT42CPbg-Zzxaxw}{1.18.48.31}{1.18.48.31:9302}{rack=rack_1, xpack.installed=true, set=2, ip=1.18.48.31, temperature=hot, region=2}]; nested: RemoteTransportException[[1566460253000005411][1.18.48.6:9302][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.4.3] since this is equal-or-newer than the primary version [6.4.3])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [2.8tb], shard size: [0b], free after allocating shard: [2.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(node meets all awareness attribute requirements)]"},"status":400}
现在集群有150个分片UNASSIGNED 强制分片都是报以上错误
请指教

5 个回复

Ombres

赞同来自: yaohe

1.18.48.6:9302 触发熔断了，内存不足了吧？

匿名用户

Accounting requests circuit breakeredit

The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in memory that are not released when a request is completed. This includes things like the Lucene segment memory.

indices.breaker.accounting.limit
Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.
indices.breaker.accounting.overhead
A constant that all accounting estimations are multiplied with to determine a final estimation. Defaults to 1

我大概看懂了。

你们的ES 堆内存大概是8GB(7GB?)吧？按照新版本的ES 限制，Accounting 限制了分段内存的使用上线。
当你迁移数据（分配分片）到这个节点上时，触发了，分段内存限制的断路器，导致失败。

你这种情况，没办法，只能加大jvm内存，或者增加断路器的限制，即可解决问题。但是不建议这么做，你这样的话，应该加机器扩容了。

这都是个人的猜测，不敢保证正确。

匿名用户

CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ]

不像18GB堆内存啊，这写这 parent 最大限制是5GB，按照70%jvm内存，也就7GB，按照95%的话就更少了啊。

18GB，parent 应该是18*0.7=12.6GB的限制。

匿名用户

这是我们的配置,64gb jvm堆内存

"breaker": {
"request": {
"limit": "60%",
"type": "memory",
"overhead": "1.0"
},
"total": {
"limit": "70%"
},
"fielddata": {
"type": "memory",
"overhead": "1.03"
},
"type": "hierarchy"
}
"breaker": {
"inflight_requests": {
"limit": "100%",
"overhead": "1.0"
}

每一个断路器的限制情况
"breakers": { "request": { "limit_size_in_bytes": 40458623385, "limit_size": "37.6gb", "estimated_size_in_bytes": 163840, "estimated_size": "160kb", "overhead": 1, "tripped": 0 }, "fielddata": { "limit_size_in_bytes": 16857759744, "limit_size": "15.7gb", "estimated_size_in_bytes": 952080280, "estimated_size": "907.9mb", "overhead": 1.03, "tripped": 0 }, "in_flight_requests": { "limit_size_in_bytes": 67431038976, "limit_size": "62.8gb", "estimated_size_in_bytes": 169181, "estimated_size": "165.2kb", "overhead": 1, "tripped": 0 }, "parent": { "limit_size_in_bytes": 47201727283, "limit_size": "43.9gb", "estimated_size_in_bytes": 952446069, "estimated_size": "908.3mb", "overhead": 1, "tripped": 0 } },

如果你们的内存是18GB 和17GB，那可能不是这个原因。

匿名用户

断路器设置的文档，默认70%的jvm内存
https://www.elastic.co/guide/e ... eaker

要回复问题请先登录或注册

shard UNASSIGNED 强制分片时失败

5 个回复

发起人

活动推荐

相关问题

问题状态

shard UNASSIGNED 强制分片时失败

与内容相关的链接

5 个回复

发起人

活动推荐

相关问题

问题状态