一个12节点的集群,六台机器,每台机器两个节点。机器内存128G,32核。 每个es实例分配了31G的内存。
昨晚所有数据节点的search线程池用尽,队列也占满了,导致ES抛出EsRejectedExecutionException,拒绝检索请求。search线程池的大小是49,queue size是1000。 故障期间,并没有异常大量的检索请求,负载并不高。
今天一直在排查这个问题,发现一个异常,不知道是否跟这个问题有关。异常信息如下:
[2019-01-17 15:49:45,833][DEBUG][action.admin.cluster.node.stats] [http_node] failed to execute on node [TcpC3thETgSsGwbMZi
M01A]
RemoteTransportException[[Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.node.stats.NodeSta
ts]]]; nested: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.action.admin.clust
er.node.stats.NodeStats]]; nested: EOFException;
Caused by: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.n
ode.stats.NodeStats]]; nested: EOFException;
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:152)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:124)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.
java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.
昨晚所有数据节点的search线程池用尽,队列也占满了,导致ES抛出EsRejectedExecutionException,拒绝检索请求。search线程池的大小是49,queue size是1000。 故障期间,并没有异常大量的检索请求,负载并不高。
今天一直在排查这个问题,发现一个异常,不知道是否跟这个问题有关。异常信息如下:
[2019-01-17 15:49:45,833][DEBUG][action.admin.cluster.node.stats] [http_node] failed to execute on node [TcpC3thETgSsGwbMZi
M01A]
RemoteTransportException[[Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.node.stats.NodeSta
ts]]]; nested: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.action.admin.clust
er.node.stats.NodeStats]]; nested: EOFException;
Caused by: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.action.admin.cluster.n
ode.stats.NodeStats]]; nested: EOFException;
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:152)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:124)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.
java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.
2 个回复
JackGe
赞同来自: ESer 、laoyang360
在读取NodeStats信息时,由于保存统计值使用的是long类型,当发生数据溢出时,StreamOutput的writeVlong方法会写入10个字节,而StreamInput的readVLong读取了9个字节,导致少读取一个字节,然后ES有以下判断
读取到的字节小于0的值就抛出EOFException异常。
解决这个问题可以通过修改readVLong方法,让该方法也能读取第10个字节。看你的问题描述search队列被打满,可能导致ThreadPoolStats中的long rejected数据溢出。
kennywu76 - Wood
赞同来自: