你可以的,加油

给环境增加一个节点后,索引被删除了,有人能帮忙分析下吗

Elasticsearch | 作者 hihins | 发布于2023年06月28日 | 阅读数:2874

服务器运行环境信息:3节点扩充到4节点;
软件版本elasticsearch 6.5.4;服务器运行环境信息 16Gb内存,集群4亿数据,设置的32分片,2副本,加节点后,分片变成了5,副本数变成了1,数据全没了(es默认的分片数和副本数);
问题发生的上下文信息:
看这个日志好像是集群状态改了,然后删分片,然后又删索引,有人能帮忙分析下是什么导致的吗
Snipaste_2023-06-28_11-12-55.png
已邀请:

God_lockin

赞同来自: charlesfang

ES 很少有自动删除索引的操作,常见的只有类似ilm的主动淘汰;删除操作基本上都是主动触发的,很有可能是某些人/client主动触发的,可以从这俩方向找:
1. 代码里面带delete操作的包/服务的日志里找触发条件
2. 管理员误操作
 
补救:
1. 禁止普通账号删除
2. 禁止手动删除索引
3. 所有命令/流量过监控留日志做回溯

xiaohei

赞同来自:

个人感觉是Master节点的问题,只有Master节点才能删除索引,应该把节点配置信息贴一下。单纯的增加Data节点,应该是不会出现数据删除情形。

Charele - Cisco4321

赞同来自:

你发的这个只是一个删除索引引起的报错,很难想出是什么原因。
 
看看有没有其它报错,或者警告,贴上来。
 
 
 
 

hihins

赞同来自:

这个是新增加那个节点的es日志
[2023-06-27T09:44:22,837][INFO ][o.e.e.NodeEnvironment    ] [node-0004] using [1] data paths, mounts [[/es-SP0-0 (/dev/sdf)]], net usable_space [1.7tb], net total_space [1.7tb], types [xfs]
[2023-06-27T09:44:22,839][INFO ][o.e.e.NodeEnvironment    ] [node-0004] heap size [15.7gb], compressed ordinary object pointers [true]
[2023-06-27T09:44:22,841][INFO ][o.e.n.Node               ] [node-0004] node name [node-0004], node ID [GFnOjBPORva8nHMiSg_Caw]
[2023-06-27T09:44:22,842][INFO ][o.e.n.Node               ] [node-0004] version[6.5.4], pid[68236], build[default/zip/d2ef93d/2018-12-17T21:17:40.758843Z], OS[Linux/3.10.0-693.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_242/25.242-b08]
[2023-06-27T09:44:22,842][INFO ][o.e.n.Node               ] [node-0004] JVM arguments [-Xms16g, -Xmx16g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.n8mwwzb9, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/cg_sp0/es_dump/, -XX:ErrorFile=logs/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:/var/log/moss/es_gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=20, -XX:GCLogFileSize=20m, -Des.path.home=/usr/local/es/elasticsearch-6.5.4, -Des.path.conf=/usr/local/es/elasticsearch-6.5.4/config, -Des.distribution.flavor=default, -Des.distribution.type=zip]
[2023-06-27T09:44:24,650][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [aggs-matrix-stats]
[2023-06-27T09:44:24,650][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [analysis-common]
[2023-06-27T09:44:24,651][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [ingest-common]
[2023-06-27T09:44:24,651][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [lang-expression]
[2023-06-27T09:44:24,651][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [lang-mustache]
[2023-06-27T09:44:24,652][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [lang-painless]
[2023-06-27T09:44:24,652][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [mapper-extras]
[2023-06-27T09:44:24,652][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [parent-join]
[2023-06-27T09:44:24,652][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [percolator]
[2023-06-27T09:44:24,653][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [rank-eval]
[2023-06-27T09:44:24,653][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [reindex]
[2023-06-27T09:44:24,653][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [repository-url]
[2023-06-27T09:44:24,653][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [transport-netty4]
[2023-06-27T09:44:24,654][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [tribe]
[2023-06-27T09:44:24,654][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-ccr]
[2023-06-27T09:44:24,654][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-core]
[2023-06-27T09:44:24,655][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-deprecation]
[2023-06-27T09:44:24,655][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-graph]
[2023-06-27T09:44:24,655][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-logstash]
[2023-06-27T09:44:24,655][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-ml]
[2023-06-27T09:44:24,656][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-monitoring]
[2023-06-27T09:44:24,656][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-rollup]
[2023-06-27T09:44:24,656][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-security]
[2023-06-27T09:44:24,657][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-sql]
[2023-06-27T09:44:24,657][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-upgrade]
[2023-06-27T09:44:24,657][INFO ][o.e.p.PluginsService     ] [node-0004] loaded module [x-pack-watcher]
[2023-06-27T09:44:24,658][INFO ][o.e.p.PluginsService     ] [node-0004] no plugins loaded
[2023-06-27T09:44:27,946][INFO ][o.e.x.s.a.s.FileRolesStore] [node-0004] parsed [0] roles from file [/usr/local/es/elasticsearch-6.5.4/config/roles.yml]
[2023-06-27T09:44:28,525][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [node-0004] [controller/71867] [Main.cc@109] controller (64 bit): Version 6.5.4 (Build b616085ef32393) Copyright (c) 2018 Elasticsearch BV
[2023-06-27T09:44:29,095][DEBUG][o.e.a.ActionModule       ] [node-0004] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2023-06-27T09:44:29,262][INFO ][o.e.d.DiscoveryModule    ] [node-0004] using discovery type [zen] and host providers [settings]
[2023-06-27T09:44:30,220][INFO ][o.e.n.Node               ] [node-0004] initialized
[2023-06-27T09:44:30,221][INFO ][o.e.n.Node               ] [node-0004] starting ...
[2023-06-27T09:44:30,346][INFO ][o.e.t.TransportService   ] [node-0004] publish_address {172.125.6.13:9300}, bound_addresses {0.0.0.0:9300}
[2023-06-27T09:44:30,359][INFO ][o.e.b.BootstrapChecks    ] [node-0004] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2023-06-27T09:44:35,857][INFO ][o.e.c.s.ClusterApplierService] [node-0004] detected_master {node-0001}{4ZxhZb_eSJ-M8C8C-4Xuzw}{c2JthES1StOv-qrKaYyjwg}{172.125.6.10}{172.125.6.10:9300}{ml.machine_memory=99802255360, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{node-0003}{G9SOEwCMQWW5DPbG5iVJGg}{jFKGdhkiSOuFhkaxPaZwcg}{172.125.6.12}{172.125.6.12:9300}{ml.machine_memory=201270022144, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node-0001}{4ZxhZb_eSJ-M8C8C-4Xuzw}{c2JthES1StOv-qrKaYyjwg}{172.125.6.10}{172.125.6.10:9300}{ml.machine_memory=99802255360, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{node-0002}{bKoOipxJSFOSL1msXcOXmQ}{jRGxVxUsQyaXBmxjwg7OgQ}{172.125.6.11}{172.125.6.11:9300}{ml.machine_memory=201270022144, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node-0001}{4ZxhZb_eSJ-M8C8C-4Xuzw}{c2JthES1StOv-qrKaYyjwg}{172.125.6.10}{172.125.6.10:9300}{ml.machine_memory=99802255360, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [80]])
[2023-06-27T09:44:36,306][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [node-0004] Failed to clear cache for realms [[]]
[2023-06-27T09:44:36,309][INFO ][o.e.x.s.a.TokenService   ] [node-0004] refresh keys
[2023-06-27T09:44:36,816][INFO ][o.e.x.s.a.TokenService   ] [node-0004] refreshed keys
[2023-06-27T09:44:36,872][INFO ][o.e.l.LicenseService     ] [node-0004] license [b7f89cde-e9bd-45b3-8ae1-2b38510a7787] mode [basic] - valid
[2023-06-27T09:44:36,897][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [node-0004] publish_address {172.125.6.13:9200}, bound_addresses {0.0.0.0:9200}
[2023-06-27T09:44:36,903][INFO ][o.e.n.Node               ] [node-0004] started
[2023-06-27T09:46:09,740][WARN ][o.e.i.IndicesService     ] [node-0004] [moss/XiT1RtZ5SZWRKRmLTHIZTg] failed to delete index
org.elasticsearch.env.ShardLockObtainFailedException: [moss][1]: obtaining shard lock timed out after 0ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:730) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:649) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.env.NodeEnvironment.lockAllForIndex(NodeEnvironment.java:595) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:546) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.IndicesService.deleteIndexStoreIfDeletionAllowed(IndicesService.java:746) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:733) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:639) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:284) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:218) ~[elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:481) ~[elasticsearch-6.5.4.jar:6.5.4]
at java.lang.Iterable.forEach(Iterable.java:75) [?:1.8.0_242]
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:478) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:465) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:416) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:160) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244) [elasticsearch-6.5.4.jar:6.5.4]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207) [elasticsearch-6.5.4.jar:6.5.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]

hihins

赞同来自:

写入es脚本的上一秒和下一秒都在写数据,中间突然报了个索引没了,以默认5分片创建了。

xiaohei

赞同来自:

看一下Master节点的日志

hihins

赞同来自:

这个应该是主节点日志:
[2023-06-27T09:44:35,802][INFO ][o.e.c.s.MasterService ] [node-0001] zen-disco-node-join[{node-0004}{GFnOjBPORva8nHMiSg_Caw}{N_t7k_5ERIGJK8_A5wTZOg}{172.125.6.13}{172.125.6.13:9300}{ml.machine_memory=201269968896, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason: added {{node-0004}{GFnOjBPORva8nHMiSg_Caw}{N_t7k_5ERIGJK8_A5wTZOg}{172.125.6.13}{172.125.6.13:9300}{ml.machine_memory=201269968896, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}
[2023-06-27T09:44:36,886][INFO ][o.e.c.s.ClusterApplierService] [node-0001] added {{node-0004}{GFnOjBPORva8nHMiSg_Caw}{N_t7k_5ERIGJK8_A5wTZOg}{172.125.6.13}{172.125.6.13:9300}{ml.machine_memory=201269968896, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {node-0001}{4ZxhZb_eSJ-M8C8C-4Xuzw}{c2JthES1StOv-qrKaYyjwg}{172.125.6.10}{172.125.6.10:9300}{ml.machine_memory=99802255360, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [80] source [zen-disco-node-join[{node-0004}{GFnOjBPORva8nHMiSg_Caw}{N_t7k_5ERIGJK8_A5wTZOg}{172.125.6.13}{172.125.6.13:9300}{ml.machine_memory=201269968896, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]]])
[2023-06-27T09:44:36,894][WARN ][o.e.d.z.ElectMasterService] [node-0001] value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [2], total number of master-eligible nodes used for publishing in this round: [4])
[2023-06-27T09:44:37,114][INFO ][o.e.c.m.MetaDataUpdateSettingsService] [node-0001] updating number_of_replicas to [2] for indices [moss]
[2023-06-27T09:46:09,382][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0001] [moss/XiT1RtZ5SZWRKRmLTHIZTg] deleting index
[2023-06-27T09:46:16,811][DEBUG][o.e.a.s.TransportSearchAction] [node-0001] All shards failed for phase: [query]
 
 
显示"discovery.zen.minimum_master_nodes" is too low. This can result in data loss! Please set it to at least a quorum of master-eligible nodes (current value: [2], total number of master-eligible nodes used for publishing in this round: [4]),后面就有deleting index的打印了

Charele - Cisco4321

赞同来自:

[2023-06-27T09:44:37,114][INFO ][o.e.c.m.MetaDataUpdateSettingsService] [node-0001] updating number_of_replicas to [2] for indices [moss]
[2023-06-27T09:46:09,382][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node-0001] [moss/XiT1RtZ5SZWRKRmLTHIZTg] deleting index
 
这里有两个操作,1是改变moss的副本数量,2是删除moss
查一下有没有可能是某些程序(或指令)执行了这些操作
 
 
 
另外,那个“discovery.zen.minimum_master_nodes”只是一个警告,
因为你现在有4个节点了,值为2有脑列风险,所以会告警。跟现有ES运行没有任何影响。
 

hihins

赞同来自:

大家不用看了,索引是测试手动用命令删除的,要不是linux记录了用户操作,真的是怎么也理不清了。

要回复问题请先登录注册