即使是不成熟的尝试,也胜于胎死腹中的策略。

ES6.3.2 使用profile分析match查询发现Collector时间过长,如何优化?

Elasticsearch | 作者 hapjin | 发布于2019年03月13日 | 阅读数:4212

索引名称为user_v1,配置如下:
{
"user_v1": {
"settings": {
"index": {
"refresh_interval": "30s",
"number_of_shards": "5",
"provided_name": "user_v1",
//....
"number_of_replicas": "1"
}
}
}
}
索引 的mapping 信息如下:不 超过10个字段
{
"user_v1": {
"mappings": {
"profile": {
"properties": {
"created": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"details": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "hanlp_standard"
},
"nick": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "hanlp_standard"
},
"signature": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "hanlp_standard"
},
//.....

profile查询如下:
发现[user_v1][1]分片的 collector time长达7.9s
  {
"id": "[wx0dqdubRkiqJJ-juAqH4A][user_v1][1]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "nick:微信 nick: nick:黄色",
"time": "674.2ms",
"time_in_nanos": 674296440,
"breakdown": {
"score": 345336336,
"build_scorer_count": 41,
"match_count": 0,
"create_weight": 44350,
"next_doc": 318735380,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 5003310,
"score_count": 4666082,
"build_scorer": 510940,
"advance": 0,
"advance_count": 0
},
"children": [
{
"type": "TermQuery",
"description": "nick:微信",
"time": "43ms",
"time_in_nanos": 43092821,
"breakdown": {
"score": 649883,
"build_scorer_count": 59,
"match_count": 0,
"create_weight": 17762,
"next_doc": 42284755,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 7030,
"score_count": 5800,
"build_scorer": 127531,
"advance": 0,
"advance_count": 0
}
},
{
"type": "TermQuery",
"description": "nick: ",
"time": "391.2ms",
"time_in_nanos": 391264139,
"breakdown": {
"score": 197751347,
"build_scorer_count": 61,
"match_count": 0,
"create_weight": 4578,
"next_doc": 183813681,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 4996728,
"score_count": 4660658,
"build_scorer": 37085,
"advance": 0,
"advance_count": 0
}
},
{
"type": "TermQuery",
"description": "nick:黄色",
"time": "117.4ms",
"time_in_nanos": 117471823,
"breakdown": {
"score": 149981,
"build_scorer_count": 39,
"match_count": 0,
"create_weight": 3613,
"next_doc": 117297171,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 997,
"score_count": 771,
"build_scorer": 19250,
"advance": 0,
"advance_count": 0
}
}
]
}
],
"rewrite_time": 1363467792,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time": "8.1s",
"time_in_nanos": 8115275034,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time": "7.9s",
"time_in_nanos": 7918401119
}
]
}
]
}
],
"aggregations": []
},

而[user_v1][4]分片的collector time 却只有几百毫秒
{
"id": "[yFnfouyXTvONUxhOmMp--A][user_v1][4]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "nick:微信 nick: nick:黄色",
"time": "870.6ms",
"time_in_nanos": 870672781,
"breakdown": {
"score": 571167502,
"build_scorer_count": 50,
"match_count": 0,
"create_weight": 76714,
"next_doc": 289187438,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 5057254,
"score_count": 4667487,
"build_scorer": 516335,
"advance": 0,
"advance_count": 0
},
"children": [
{
"type": "TermQuery",
"description": "nick:微信",
"time": "1.4ms",
"time_in_nanos": 1472493,
"breakdown": {
"score": 942180,
"build_scorer_count": 70,
"match_count": 0,
"create_weight": 30118,
"next_doc": 306063,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 7108,
"score_count": 5647,
"build_scorer": 181306,
"advance": 0,
"advance_count": 0
}
},
{
"type": "TermQuery",
"description": "nick: ",
"time": "590ms",
"time_in_nanos": 590014764,
"breakdown": {
"score": 398076512,
"build_scorer_count": 72,
"match_count": 0,
"create_weight": 9719,
"next_doc": 182139208,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 5050613,
"score_count": 4662107,
"build_scorer": 76532,
"advance": 0,
"advance_count": 0
}
},
{
"type": "TermQuery",
"description": "nick:黄色",
"time": "277.2micros",
"time_in_nanos": 277285,
"breakdown": {
"score": 183306,
"build_scorer_count": 54,
"match_count": 0,
"create_weight": 7504,
"next_doc": 42644,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 893,
"score_count": 774,
"build_scorer": 42109,
"advance": 0,
"advance_count": 0
}
}
]
}
],
"rewrite_time": 572186,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time": "862.4ms",
"time_in_nanos": 862446475,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time": "598.8ms",
"time_in_nanos": 598822927
}
]
}
]
}
],
"aggregations": []
}
]
}

有一点可肯定的是:[user_v1][4]是SSD硬盘,而[user_v1][1]是机械硬盘。如果是硬盘的原因,那这个差距也太大了吧?
求问,Collector时间具体是指哪些操作的时间呢?还有其他原因导致 collector time相差如此之大的原因么?怎么解决呢?
 
关于Collector time:解释,有点不明白。
Collectors are the processes which are responsible for gathering raw results,  combining them, filtering, sorting etc.
Each profile also contains a section about the Lucene Collectors which run the search
 
 
 
已邀请:

rochy - rochy_he

赞同来自:

ES 将各个子查询求用 Collector 包裹,Collector 传给 lucene,进行真正的 lucene 查询
Collector 中进行合并、过滤、排序以及获取数据内容的操作
 
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time": "862.4ms",
"time_in_nanos": 862446475,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time": "598.8ms",
"time_in_nanos": 598822927
}
]
}
]
上述 search_top_hits 也就是获取最后数据的耗时高达 600ms,有可能是 SSD 硬盘的缘故
 

zmc - ES PAAS、JuiceFS

赞同来自:

这个问题有解决吗?最近我也发现在不同版本之间,script脚本执行时间差距很大,高版本的性能反而差,也是collector时间特别长...

要回复问题请先登录注册