三人行必有我师

elasticsearch translog恢复到一定百分比卡住(stuck),导致索引无法恢复

Elasticsearch | 作者 KrisOtk | 发布于2020年07月17日 | 阅读数:4950

es版本: 7.4.1
背景:
生产的某个es集群,3master15data,master 16g,data 30g
某天早上,有同事反馈es集群无法写入了,报了较多sockettimeout;
于是直接登上kibana,先看集群状态,green没有异常,然后上监控大盘看了一下这个es集群的最近24h的内存,磁盘,线程池,发现有单个节点较多拒绝掉和排队的任务,但是这个集群是个新集群,都是ssd,数据量不算很大,应该不存在瓶颈,所以初步怀疑某个数据节点出问题了;
之后登陆了问题节点,查看了一下es的日志信息,发现了部分任务队列满了的日志信息,那就证明这个节点处理能力不行,但是考虑到这批机器都是新的物理机,单机多节点,遂找运维看了一下磁盘有没有异常,结果也是没有异常,每个data节点的硬件信息基本一致,不应该存在较大的处理能力差别;
后面登陆了一下kibana,通过_cat/tasks 查看发现部分search,bulk命令执行时间已经持续了1h以上了,而且task的总量已经超过几万了,而且基本都集中在问题节点上,里面还有一个'internal:indices/flush/synced/pre'的操作,看起来像是同步刷盘的操作,显示执行时间已经超过7h了,应用方没有手动flush的地方,怀疑可能刷盘卡住了,导致后续的查询和写入全部被阻塞起来了;
没办法只能想办法重启问题节点了,把delayed_timeout全部设置成24h后,通过运维平台直接重启了问题节点,
GET _cat/recovery?v&h=i,s,t,ty,st,shost,thost,f,fp,b,bp,to,tor,top&active_only
一直查看进度,但是有一个分片恢复进度一直都是0%,关键主分片没问题,副本分配为啥一直没办法恢复呢,想到可能跟上面的flush阻塞有关,只好把'internal:indices/flush/synced/pre'这个任务对应的节点也重启了,然后看到恢复进度逐渐上升,看了一眼translog的大小,1600w,有点多~
恢复的前半段比较顺利,但是到了33.1%的进度之后再也不上升了,只觉人生黯淡~努力思考,给自己找了个理由,前面flush刷盘卡住了,导致之后的translog全部都是无效的,无法recovery...只能想办法把translog丢了,后面让业务部分重新同步最近的数据了;
努力去官网找干掉translog的方法,7.x有点区别了,找了会文档
bin/elasticsearch-shard remove-corrupted-data   --dir /data/es68-data/nodes/0/indices/aqxxxxTiYfUC_w8A/7/translog/ --truncate-clean-translog

执行完提示(生成一个额外的translog,)
POST /_cluster/reroute
{
"commands" : [
{
"allocate_stale_primary" : {
"index" : "xxx_index",
"shard" : 7,
"node" : "
aqxxxxTiYfUC_w8A",
"accept_data_loss" : false
}
}
]
}
# 失败的话,把 accept_data_loss 改为true
执行完,重启节点,不出意料的需要接受数据丢失,后面让业务方去补数据了~
 
几个问题:
1. `internal:indices/flush/synced/pre` 这个action代表的到底是什么操作;
2.  为什么会出现search/bulk操作阻塞一个多少小时的情况,跟flush有没有关联关系;(有没有类似遭遇的小伙伴提供下建议)
3.  有没有遇到过这种问题的大佬提供下处理建议
已邀请:

KrisOtk - kris

赞同来自:

后续:
打印的堆栈信息中(dump时'_cat/tsaks?v'显示任务`internal:indices/flush/synced/pre`的执行时间为11h),在里面找到了一个blocked的地方,根据`lock: 0x0000000305f692c0`发现主要是两个地方,一个是refresh,一个是flush,上面的refresh线程阻塞等待下面的flush操作释放锁,但是一直都是flush线程一直是runnable状态;
仔细看了下堆栈信息,发现以下信息:
cpu=38191486.02ms elapsed=39266.26s


39266.26/3600=10.9072944444;
和11h吻合,那就证明flush这个操作在定期commit的时候一直未执行完成,这个原因没有思考出来 
"elasticsearch[es61-data3][refresh][T#3]" #445 daemon prio=5 os_prio=0 cpu=59410.44ms elapsed= tid=0x00007fbb7c08b800 nid=0x17e7d waiting for monitor entry  [0x00007fb794ffd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:491)
- waiting to lock <0x0000000305f692c0> (a java.lang.Object)
at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262)
at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)
at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165)
at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:66)
at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:40)
at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1596)
at org.elasticsearch.index.engine.InternalEngine.writeIndexingBuffer(InternalEngine.java:1635)
at org.elasticsearch.index.shard.IndexShard.writeIndexingBuffer(IndexShard.java:2019)
at org.elasticsearch.indices.IndexingMemoryController$1.doRun(IndexingMemoryController.java:190)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@13/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@13/Thread.java:830)

...

"elasticsearch[es61-data3][flush][T#8]" #495 daemon prio=5 os_prio=0 cpu=38191486.02ms elapsed=39266.26s tid=0x00007fbb7c086000 nid=0x18569 runnable [0x00007fb884da5000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.index.FrozenBufferedUpdates.applyDocValuesUpdates(FrozenBufferedUpdates.java:569)
at org.apache.lucene.index.FrozenBufferedUpdates.applyDocValuesUpdates(FrozenBufferedUpdates.java:451)
at org.apache.lucene.index.FrozenBufferedUpdates.apply(FrozenBufferedUpdates.java:421)
at org.apache.lucene.index.FrozenBufferedUpdates.forceApply(FrozenBufferedUpdates.java:249)
at org.apache.lucene.index.FrozenBufferedUpdates.tryApply(FrozenBufferedUpdates.java:159)
at org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$3(IndexWriter.java:2591)
at org.apache.lucene.index.IndexWriter$$Lambda$4831/0x000000080190f440.process(Unknown Source)
at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5115)
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3211)
- locked <0x0000000305f692c0> (a java.lang.Object)
- locked <0x0000000305f692a8> (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3444)
- locked <0x0000000305f692a8> (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3409)
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2439)
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1773)
at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1089)
at org.elasticsearch.index.shard.IndexShard$5.doRun(IndexShard.java:3095)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:773)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@13/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@13/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@13/Thread.java:830)





 

hackerwin7 - aggregator

赞同来自:

能排除单个节点的磁盘的问题?

luyuncheng

赞同来自:

你是不是有频繁更新同一条记录?

JiangJibo - 喊我雷锋

赞同来自:

`internal:indices/flush/synced/pre`  这个action是集成在 `indices:admin/synced_flush` 里面的,在执行flush前先向所有shard发送前置请求,确保通信正常,当接收到前置请求响应后才真正发送flush请求
 

要回复问题请先登录注册