discovery.zen.ping_timeout 参数作用的疑惑和探究

这个参数网上的解释都是集群ping过程的超时等待时间。

但是我发现这个参数配置越大，选主的过程越长，我配置了一分钟，结果每次主节点重启的时候整个集群都会有一段时间不可用，而且选主过程非常慢；但是当我设置了10s后，选主过程快了很多，虽然也会抛异常但是很快就能选举出主节点。

然后看了下代码，发现ping的回调函数确实需要等待discovery.zen.ping_timeout 这个配置对应的时间才会返回。代码如下：

ZenDiscovery类的findMaster开头有这么一句，就是选主的方法调用。

ZenPing.PingResponse[] fullPingResponses = pingService.pingAndWait(pingTimeout);

这个方法的执行时间也就是选主所需要的时间，然后接着看

public PingResponse[] pingAndWait(TimeValue timeout) {

        final AtomicReference<PingResponse[]> response = new AtomicReference<>();

        final CountDownLatch latch = new CountDownLatch(1);

        ping(new PingListener() {

            @Override

            public void onPing(PingResponse[] pings) {

                response.set(pings);

                latch.countDown();

            }

        }, timeout);

        try {

            latch.await();

            return response.get();

        } catch (InterruptedException e) {

            logger.trace("pingAndWait interrupted");

            return null;

        }

    }

pingAndWait方法里面执行了ping，并等待回调通知后再继续执行，所以timeout究竟做了什么呢？

 @Override

    public void ping(PingListener listener, TimeValue timeout) {

        List<? extends ZenPing> zenPings = this.zenPings;

        CompoundPingListener compoundPingListener = new CompoundPingListener(listener, zenPings);

        for (ZenPing zenPing : zenPings) {

            try {

                zenPing.ping(compoundPingListener, timeout);

            } catch (EsRejectedExecutionException ex) {

                logger.debug("Ping execution rejected", ex);

                compoundPingListener.onPing(null);

            }

        }

    }

这个是ping的实现，其实就是把所有节点ping了一遍，具体看try-catch的那个ping调用：

@Override

    public void ping(final PingListener listener, final TimeValue timeout) {

        final SendPingsHandler sendPingsHandler = new SendPingsHandler(pingHandlerIdGenerator.incrementAndGet());

        try {

            receivedResponses.put(sendPingsHandler.id(), sendPingsHandler);

            try {

                sendPings(timeout, null, sendPingsHandler);

            } catch (RejectedExecutionException e) {

                logger.debug("Ping execution rejected", e);

                // The RejectedExecutionException can come from the fact unicastConnectExecutor is at its max down in sendPings

                // But don't bail here, we can retry later on after the send ping has been scheduled.

            }

            threadPool.schedule(TimeValue.timeValueMillis(timeout.millis() / 2), ThreadPool.Names.GENERIC, new AbstractRunnable() {

                @Override

                protected void doRun() {

                    sendPings(timeout, null, sendPingsHandler);

                    threadPool.schedule(TimeValue.timeValueMillis(timeout.millis() / 2), ThreadPool.Names.GENERIC, new AbstractRunnable() {

                        @Override

                        protected void doRun() throws Exception {

                            sendPings(timeout, TimeValue.timeValueMillis(timeout.millis() / 2), sendPingsHandler);

                            sendPingsHandler.close();

                            listener.onPing(sendPingsHandler.pingCollection().toArray());

                            for (DiscoveryNode node : sendPingsHandler.nodeToDisconnect) {

                                logger.trace("[{}] disconnecting from {}", sendPingsHandler.id(), node);

                                transportService.disconnectFromNode(node);

                            }

                        }



                        @Override

                        public void onFailure(Throwable t) {

                            logger.debug("Ping execution failed", t);

                            sendPingsHandler.close();

                        }

                    });

                }



                @Override

                public void onFailure(Throwable t) {

                    logger.debug("Ping execution failed", t);

                    sendPingsHandler.close();

                }

            });

        } catch (EsRejectedExecutionException ex) { // TODO: remove this once ScheduledExecutor has support for AbstractRunnable

            sendPingsHandler.close();

            // we are shutting down

        } catch (Exception e) {

            sendPingsHandler.close();

            throw new ElasticsearchException("Ping execution failed", e);

        }

    }

前面注意到，我们的findmaster中的选主时间是由pingAndWait 这个方法决定的，而这个方法一直在等待onPing回调的执行，所以onPing执行完才会结束。所以我们只要关注PingListener的onPing什么时候触发，就知道什么时候选主完成了。
很显然，是在scheduler中执行的，但是看下threadPool.schedule，这个本身就是ScheduledThreadPoolExecutor的包装，其第一个参数对应的就是ScheduledThreadPoolExecutor的delay，也就算是延迟多久执行，很显然他传递的是(timeout.millis() / 2)，一半的discovery.zen.ping_timeout对应的时间。

在就是sendPings，这个方法也设置了等待时间，点进去看的话会发现等待时间也是一半的ping_time.

void sendPings(final TimeValue timeout, @Nullable TimeValue waitTime, final SendPingsHandler sendPingsHandler) {

        final UnicastPingRequest pingRequest = new UnicastPingRequest();

        pingRequest.id = sendPingsHandler.id();

        pingRequest.timeout = timeout;

        DiscoveryNodes discoNodes = contextProvider.nodes();



        pingRequest.pingResponse = createPingResponse(discoNodes);



        HashSet<DiscoveryNode> nodesToPingSet = new HashSet<>();

        for (PingResponse temporalResponse : temporalResponses) {

            // Only send pings to nodes that have the same cluster name.

            if (clusterName.equals(temporalResponse.clusterName())) {

                nodesToPingSet.add(temporalResponse.node());

            }

        }



        for (UnicastHostsProvider provider : hostsProviders) {

            nodesToPingSet.addAll(provider.buildDynamicNodes());

        }



        // add all possible master nodes that were active in the last known cluster configuration

        for (ObjectCursor<DiscoveryNode> masterNode : discoNodes.getMasterNodes().values()) {

            nodesToPingSet.add(masterNode.value);

        }



        // sort the nodes by likelihood of being an active master

        List<DiscoveryNode> sortedNodesToPing = electMasterService.sortByMasterLikelihood(nodesToPingSet);



        // new add the the unicast targets first

        List<DiscoveryNode> nodesToPing = CollectionUtils.arrayAsArrayList(configuredTargetNodes);

        nodesToPing.addAll(sortedNodesToPing);



        final CountDownLatch latch = new CountDownLatch(nodesToPing.size());

        for (final DiscoveryNode node : nodesToPing) {

            // make sure we are connected

            final boolean nodeFoundByAddress;

            DiscoveryNode nodeToSend = discoNodes.findByAddress(node.address());

            if (nodeToSend != null) {

                nodeFoundByAddress = true;

            } else {

                nodeToSend = node;

                nodeFoundByAddress = false;

            }



            if (!transportService.nodeConnected(nodeToSend)) {

                if (sendPingsHandler.isClosed()) {

                    return;

                }

                // if we find on the disco nodes a matching node by address, we are going to restore the connection

                // anyhow down the line if its not connected...

                // if we can't resolve the node, we don't know and we have to clean up after pinging. We do have

                // to make sure we don't disconnect a true node which was temporarily removed from the DiscoveryNodes

                // but will be added again during the pinging. We therefore create a new temporary node

                if (!nodeFoundByAddress) {

                    if (!nodeToSend.id().startsWith(UNICAST_NODE_PREFIX)) {

                        DiscoveryNode tempNode = new DiscoveryNode("",

                                UNICAST_NODE_PREFIX + unicastNodeIdGenerator.incrementAndGet() + "_" + nodeToSend.id() + "#",

                                nodeToSend.getHostName(), nodeToSend.getHostAddress(), nodeToSend.address(), nodeToSend.attributes(), nodeToSend.version()

                        );

                        logger.trace("replacing {} with temp node {}", nodeToSend, tempNode);

                        nodeToSend = tempNode;

                    }

                    sendPingsHandler.nodeToDisconnect.add(nodeToSend);

                }

                // fork the connection to another thread

                final DiscoveryNode finalNodeToSend = nodeToSend;

                unicastConnectExecutor.execute(new Runnable() {

                    @Override

                    public void run() {

                        if (sendPingsHandler.isClosed()) {

                            return;

                        }

                        boolean success = false;

                        try {

                            // connect to the node, see if we manage to do it, if not, bail

                            if (!nodeFoundByAddress) {

                                logger.trace("[{}] connecting (light) to {}", sendPingsHandler.id(), finalNodeToSend);

                                transportService.connectToNodeLight(finalNodeToSend);

                            } else {

                                logger.trace("[{}] connecting to {}", sendPingsHandler.id(), finalNodeToSend);

                                transportService.connectToNode(finalNodeToSend);

                            }

                            logger.trace("[{}] connected to {}", sendPingsHandler.id(), node);

                            if (receivedResponses.containsKey(sendPingsHandler.id())) {

                                // we are connected and still in progress, send the ping request

                                sendPingRequestToNode(sendPingsHandler.id(), timeout, pingRequest, latch, node, finalNodeToSend);

                            } else {

                                // connect took too long, just log it and bail

                                latch.countDown();

                                logger.trace("[{}] connect to {} was too long outside of ping window, bailing", sendPingsHandler.id(), node);

                            }

                            success = true;

                        } catch (ConnectTransportException e) {

                            // can't connect to the node - this is a more common path!

                            logger.trace("[{}] failed to connect to {}", e, sendPingsHandler.id(), finalNodeToSend);

                        } catch (RemoteTransportException e) {

                            // something went wrong on the other side

                            logger.debug("[{}] received a remote error as a response to ping {}", e, sendPingsHandler.id(), finalNodeToSend);

                        } catch (Throwable e) {

                            logger.warn("[{}] failed send ping to {}", e, sendPingsHandler.id(), finalNodeToSend);

                        } finally {

                            if (!success) {

                                latch.countDown();

                            }

                        }

                    }

                });

            } else {

                sendPingRequestToNode(sendPingsHandler.id(), timeout, pingRequest, latch, node, nodeToSend);

            }

        }

        if (waitTime != null) {

            try {

                latch.await(waitTime.millis(), TimeUnit.MILLISECONDS);

            } catch (InterruptedException e) {

                // ignore

            }

        }

    }

注意最后这个if,如果waitTime!=null 则latch.await。外面传递的waitTime是一半的ping_time哦。

所以初步得出结论 ping_time 代表的是ping请求调用超时时间，但同时也是选主的delay time。

社区的同学们你们怎么理解的呢？

4 个回复

发起人

相关问题

问题状态

discovery.zen.ping_timeout 参数作用的疑惑和探究

与内容相关的链接

4 个回复

发起人

相关问题

问题状态