嘿~ 今天天气不错嘛

es node 脱离又加入集群的问题(补充了es日志)

Elasticsearch | 作者 mmhub | 发布于2017年06月12日 | 阅读数:6804

最近发现时常有个节点会脱离集群,过一会儿又自加入集群。导致节点上的分片未分配,节点开始初始化分片。不知有人碰到过这个问题吗?es是版本:5.1.1,日志如下:
 
master节点日志:
[2017-06-13T06:18:27,882][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [node78] failed to execute [indices:monitor/stats] on node [DNXvI5KuSxyduHuTO4LHvQ]
org.elasticsearch.transport.RemoteTransportException: [Failed to deserialize response of type [org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$NodeResponse]]
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize response of type [org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$NodeResponse]
    at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1278) [elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1250) [elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.1.1.jar:5.1.1]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:536) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:490) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.6.Final.jar:4.1.6.Final]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: java.io.EOFException: tried to read: 429393 bytes but only 1523 remaining
    at org.elasticsearch.transport.netty4.ByteBufStreamInput.ensureCanReadBytes(ByteBufStreamInput.java:75) ~[?:?]
    at org.elasticsearch.common.io.stream.FilterStreamInput.ensureCanReadBytes(FilterStreamInput.java:80) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.common.io.stream.StreamInput.readArraySize(StreamInput.java:892) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.common.io.stream.StreamInput.readString(StreamInput.java:334) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.common.io.stream.StreamInput.readOptionalString(StreamInput.java:306) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.cluster.routing.ShardRouting.<init>(ShardRouting.java:251) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.cluster.routing.ShardRouting.<init>(ShardRouting.java:274) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.action.admin.indices.stats.ShardStats.readFrom(ShardStats.java:92) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.action.admin.indices.stats.ShardStats.readShardStats(ShardStats.java:86) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.readShardResult(TransportIndicesStatsAction.java:80) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.readShardResult(TransportIndicesStatsAction.java:47) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$NodeResponse.readFrom(TransportBroadcastByNodeAction.java:572) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1275) ~[elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1250) [elasticsearch-5.1.1.jar:5.1.1]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) ~[?:?]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) ~[?:?]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) ~[?:?]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:536) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:490) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) ~[?:?]
    at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_111]
 
其他节点日志:
[2017-06-13T06:18:22, 942][INFO ][o.e.c.s.ClusterService   ][node77] removed {{node79}{DNXVI14aoUFLWUEO}{-JOFLSKYQlangmk9iQ-_lA}{*.*.*.*}{*.*.*.*:9300},}, reason: zen-disco-receive(from master [master {node78}{48Ek9LkpTxiLirHGuYg3TA}{Q3wBoD9tS1KfJ87Skb0Nfg}{*.*.*.*}{*.*.*.*:9300} committed version [265457]])
[2017-06-13T06:18:41, 239][INFO ][o.e.c.s.ClusterService   ][node77] added {{node79}{DNXVI14aoUFLWUEO}{-JOFLSKYQlangmk9iQ-_lA}{*.*.*.*}{*.*.*.*:9300},}, reason: zen-disco-receive(from master [master {node78}{48Ek9LkpTxiLirHGuYg3TA}{Q3wBoD9tS1KfJ87Skb0Nfg}{*.*.*.*}{*.*.*.*:9300} committed version [265466]])
已邀请:

kennywu76 - Wood

赞同来自: mmhub

这是ES的Bug,升级到5.1.2以上可以解决这个问题。 
 
相关问题的issue链接在这里:
https://github.com/elastic/ela ... 22551 
 
修复问题的pull request: 
 https://github.com/elastic/ela ... 22317
 

lz8086 - es小司机

赞同来自:

日志图片呢

lz8086 - es小司机

赞同来自:

网络环境是不是不太稳定,建议调大discovery.zen.fd.ping_timeout: 100s 值

要回复问题请先登录注册