主分片丢失,集群红色,报错 nested: CorruptIndexException[checksum failed (hardware problem?)
Elasticsearch | 作者 Jaret | 发布于2021年04月16日 | 阅读数:3093
今天频繁创建删除一个indexA,多次之后另一个数据量最大的 indexB 出现了主分片丢失,但是该indexB 为了加快插入速度没有设置副本,所以丢失之后不能恢复,其丢失的主分片信息,报错如下:
{
"state": "UNASSIGNED",
"primary": true,
"node": null,
"relocating_node": null,
"shard": 40,
"index": "device_search_20201204",
"recovery_source": {
"type": "EXISTING_STORE",
"bootstrap_new_history_uuid": true
},
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2021-04-16T12:29:01.668Z",
"failed_attempts": 2,
"failed_nodes": [
"BT7MEPbJTu67N7Op6GLXEQ"
],
"delayed": false,
"details": "failed shard on node [BT7MEPbJTu67N7Op6GLXEQ]: failed recovery, failure RecoveryFailedException[[device_search_20201204][40]: Recovery failed on {reading_10.10.2.75_node1}{BT7MEPbJTu67N7Op6GLXEQ}{fMulPoCDSL-tmFwJFpaUSQ}{10.10.2.75}{10.10.2.75:9401}{dil}{ml.machine_memory=539647844352, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: CorruptIndexException[failed engine (reason: [merge failed]) (resource=preexisting_corruption)]; nested: IOException[failed engine (reason: [merge failed])]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=2db1254c actual=4bf6cdcc (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/home/wsn/es/es7.5/node_1/data/nodes/0/indices/QGft9wywTOeSNjcsz_UUHA/40/index/_e6k.cfs") [slice=_e6k_Lucene50_0.tim]))]; ",
"allocation_status": "no_valid_shard_copy"
}
}
请问是磁盘的问题吗?但是我看了集群磁盘使用率只有60%,内存也充足。cpu也正常。
尝试了重新路由, _cluster/reroute?pretty 仍然无果
请教各位大佬一下是什么问题呢,又遇到过吗?
{
"state": "UNASSIGNED",
"primary": true,
"node": null,
"relocating_node": null,
"shard": 40,
"index": "device_search_20201204",
"recovery_source": {
"type": "EXISTING_STORE",
"bootstrap_new_history_uuid": true
},
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2021-04-16T12:29:01.668Z",
"failed_attempts": 2,
"failed_nodes": [
"BT7MEPbJTu67N7Op6GLXEQ"
],
"delayed": false,
"details": "failed shard on node [BT7MEPbJTu67N7Op6GLXEQ]: failed recovery, failure RecoveryFailedException[[device_search_20201204][40]: Recovery failed on {reading_10.10.2.75_node1}{BT7MEPbJTu67N7Op6GLXEQ}{fMulPoCDSL-tmFwJFpaUSQ}{10.10.2.75}{10.10.2.75:9401}{dil}{ml.machine_memory=539647844352, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: CorruptIndexException[failed engine (reason: [merge failed]) (resource=preexisting_corruption)]; nested: IOException[failed engine (reason: [merge failed])]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=2db1254c actual=4bf6cdcc (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/home/wsn/es/es7.5/node_1/data/nodes/0/indices/QGft9wywTOeSNjcsz_UUHA/40/index/_e6k.cfs") [slice=_e6k_Lucene50_0.tim]))]; ",
"allocation_status": "no_valid_shard_copy"
}
}
请问是磁盘的问题吗?但是我看了集群磁盘使用率只有60%,内存也充足。cpu也正常。
尝试了重新路由, _cluster/reroute?pretty 仍然无果
请教各位大佬一下是什么问题呢,又遇到过吗?
0 个回复