好的想法是十分钱一打,真正无价的是能够实现这些想法的人。

两个结果相同的(terms aggs)聚合查询,聚合查询时间差几百倍。

Elasticsearch | 作者 pqy | 发布于2017年06月09日 | 阅读数:3635

一个直接用 terms aggs ,一个用 terms aggs 的 painless script 写法。求大神解惑一下。我这个索引比较大,没有副本,5.16T
 
测试1
GET /weibo/weibo/_search
{
"query": {
"bool": {
"filter": [{
"query_string": {
"fields": ["content_full"],
"query" : "\"@小米主题\""
}
}, {
"range": {
"published_at": {
"gte": "2016-05-01",
"lte": "2017-05-01"
}
}
}]
}
},
"size": 0,
"_source": {
"excludes":
},
"aggs": {
"sdf": {
"terms": {
"field": "weiboer_id"
}
}
}
}
 
结果一 (每次都是10多秒,有同事执行100多万次这样的语句,es 集群所有节点很快 queue 1000 ,出现大量rejected  )
{
"took": 12715,
"timed_out": false,
"_shards": {
"total": 45,
"successful": 45,
"failed": 0
},
"hits": {
"total": 11043,
"max_score": 0,
"hits":
},
"aggregations": {
"sdf": {
"doc_count_error_upper_bound": 51,
"sum_other_doc_count": 10298,
"buckets": [
{
"key": "1657835230",
"doc_count": 140
},
{
"key": "1710824123",
"doc_count": 106
},
{
"key": "2117757773",
"doc_count": 90
},
{
"key": "1669734057",
"doc_count": 81
},
{
"key": "1787512884",
"doc_count": 75
},
{
"key": "1659333803",
"doc_count": 62
},
{
"key": "1727090064",
"doc_count": 59
},
{
"key": "1768333234",
"doc_count": 45
},
{
"key": "1747268970",
"doc_count": 44
},
{
"key": "1720297231",
"doc_count": 43
}
]
}
}
}

结果1 search profiler

1111.png


2222.png

 
 
测试2 使用 painless script 写法的terms aggs 
GET /weibo/weibo/_search
{
"query": {
"bool": {
"filter": [{
"query_string": {
"fields": ["content_full"],
"query" : "\"@小米主题\""
}
}, {
"range": {
"published_at": {
"gte": "2014-05-01",
"lte": "2017-05-01"
}
}
}]
}
},
"size": 0,
"_source": {
"excludes":
},
"aggs" : {
"clientip_top10" : {
"terms" : {
"script" : {
"lang" : "painless",
"inline" : "doc['weiboer_id'].value"
}
}
}
}
}
结果2(每次差不多平均68ms,绝大多数 1秒一下 )
 
{
"took": 64,
"timed_out": false,
"_shards": {
"total": 45,
"successful": 45,
"failed": 0
},
"hits": {
"total": 19317,
"max_score": 0,
"hits":
},
"aggregations": {
"clientip_top10": {
"doc_count_error_upper_bound": 81,
"sum_other_doc_count": 18041,
"buckets": [
{
"key": "1649471152",
"doc_count": 447
},
{
"key": "1657835230",
"doc_count": 179
},
{
"key": "1710824123",
"doc_count": 120
},
{
"key": "1659333803",
"doc_count": 91
},
{
"key": "2117757773",
"doc_count": 88
},
{
"key": "1704468187",
"doc_count": 81
},
{
"key": "1669734057",
"doc_count": 80
},
{
"key": "1787512884",
"doc_count": 69
},
{
"key": "1727090064",
"doc_count": 64
},
{
"key": "1747268970",
"doc_count": 57
}
]
}
}
}
结果2 search profiler
 

3333.png


4444.png

 
已邀请:

kennywu76 - Wood^Trip.com

赞同来自: lunatictwo

针对你这个具体场景,第一个聚合语句里加一个 "execution_hint": "map"选项,就会飞快。
具体写法:
GET /weibo/weibo/_search
{
"query": {
"bool": {
"filter": [{
"query_string": {
"fields": ["content_full"],
"query" : "\"@小米主题\""
}
}, {
"range": {
"published_at": {
"gte": "2016-05-01",
"lte": "2017-05-01"
}
}
}]
}
},
"size": 0,
"_source": {
"excludes":
},
"aggs": {
"sdf": {
"terms": {
"field": "weiboer_id",
"execution_hint": "map"
}
}
}
}

 

pqy

赞同来自:

上面的 data range 不一样,但是不影响

lz8086 - es小司机

赞同来自:

小白问句题外话,您5.16T的索引分了多少分片,索引太大了会不会导致查询结果变慢呢

pqy

赞同来自:

@lz8086 es 5.4.0 45个分片 15个data节点
Simple Query String Query 第一次查询差不多 400ms
GET /weibo/weibo/_search
{
"query": {
"bool": {
"filter": [{
"query_string": {
"fields": ["content_full"],
"query" : "\"@炭烤八爪君\""
}
}]
}
}
}
{
  "took": 413,
  "timed_out": false,
  "_shards": {
    "total": 45,
    "successful": 45,
    "failed": 0
  },
  "hits": {
    "total": 866,
    "max_score": 0,
    "hits": [
      {....

555.png

 

fhyes123 - ES小白

赞同来自:

小白咨询你个问题,麻烦问下你用的什么工具写的Elasticsearch代码,我用的head插件,没有代码提示,也没有你那种监控的页面

要回复问题请先登录注册