使用netstat -lntp来看看有侦听在网络某端口的进程。当然,也可以使用 lsof。

ES6.3.2 refresh 的参数refresh=true和refresh=wait_for的区别?

Elasticsearch | 作者 hapjin | 发布于2019年04月09日 | 阅读数:8236

在实际项目中,为了使得delete 文档之后,立即对搜索不可见,我使用了 DELETE /user/profile/1?refresh=wait_for(这里只是示例,实际我是用 java api 中的bulk delete),但是响应时间实在是太慢了。
 bulkRequest.setRefreshPolicy(DeleteRequest.RefreshPolicy.WAIT_UNTIL);
 
然后去看了下refresh官方文档中的介绍,但是有些地方还是不太明白。


refresh=wait_for only affects the request that it is on, but, by forcing a refresh immediately, refresh=true will affect other ongoing requests. 


每发生一次index/delete操作,wait_for 只会影响当前的请求(比如delete 请求设置了refresh=wait_for,那么refresh只影响当前的delete请求,不影响其他用户并发发起的index/search请求),那么:refresh=true,影响其他正在进行的请求,就好理解了。
 

另外, index.max_refresh_listeners 参数又起到了什么作用?
 
PS:每次refresh就会生成许多小的segment


true creates less efficient indexes constructs (tiny segments) that must later be merged into more efficient index constructs (larger segments). 


而小的segment会影响search的性能


Meaning that the cost of true is paid at index time to create the tiny segment, at search time to search the tiny segment, and at merge time to make the larger segments.


已邀请:

zqc0512 - andy zhou

赞同来自:

你先删除,再手动刷新吧。把这个事拆分下就应该可以了吧。
 

hapjin

赞同来自:

补充一下,ES6.3.2 Rest High Level JAVA API中的关于refresh参数的源码注释:
org.elasticsearch.action.support.WriteRequest.RefreshPolicy
 
        /**
* Don't refresh after this request. The default.
*/
NONE("false"),
/**
* Force a refresh as part of this request. This refresh policy does not scale for high indexing or search throughput but is useful
* to present a consistent view to for indices with very low traffic. And it is wonderful for tests!
*/
IMMEDIATE("true"),
/**
* Leave this request open until a refresh has made the contents of this request visible to search. This refresh policy is
* compatible with high indexing and search throughput but it causes the request to wait to reply until a refresh occurs.
*/
WAIT_UNTIL("wait_for");

比如在删除时指定 WAIT_UNTIL,该删除请求会被阻塞,直至这次这些删除的文档不再在下一次搜索可见。 
另外,refresh参数应该是用来控制搜索可见性(search visible)的,而translog应该是保证数据可靠性的。

hapjin

赞同来自:

当多个线程同时在refresh segments时,lucene 提供了2种refresh方式:阻塞和非阻塞。
阻塞:org.apache.lucene.search.ReferenceManager#maybeRefreshBlocking


if another thread is currently refreshing, this method blocks until that thread completes. It is useful if you want to guarantee that the next call to {@link #acquire()}  will return a refreshed instance.
 


线程A正在执行refreshing,线程B此时调用:maybeRefreshBlocking就会阻塞,直到线程A的refreshing完成。那maybeRefreshBlocking就能保证线程B上的操作是"搜索可见"的。
 
非阻塞:
org.apache.lucene.search.ReferenceManager#maybeRefresh


If this method returns true it means the calling thread either refreshed or that there were no changes to refresh. If it returns false it means another  thread is currently refreshing.


线程A调用maybeRefresh返回true,线程A执行了一次refreshed完成了,或者没有新的docuemnt需要refresh。返回false,表明:当前有其他线程(比如线程B)在refreshing中....

要回复问题请先登录注册