不要急,总有办法的

terms aggs时 sum_other_doc_count>0

Elasticsearch | 作者 zhaohao | 发布于2018年03月27日 | 阅读数:2594

插入数据
curl -XPOST '192.168.2.58:9200/test_index1/test_type1/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "index": {}}
{"chargetime":302,"filetype":0,"totalsize":0}
{ "index": {}}
{"chargetime":302,"filetype":1,"totalsize":1}
{ "index": {}}
{"chargetime":302,"filetype":2,"totalsize":2}
{ "index": {}}
{"chargetime":302,"filetype":3,"totalsize":3}
{ "index": {}}
{"chargetime":302,"filetype":4,"totalsize":4}
{ "index": {}}
{"chargetime":302,"filetype":5,"totalsize":5}
{ "index": {}}
{"chargetime":302,"filetype":6,"totalsize":6}
{ "index": {}}
{"chargetime":302,"filetype":7,"totalsize":7}
{ "index": {}}
{"chargetime":302,"filetype":8,"totalsize":8}
{ "index": {}}
{"chargetime":302,"filetype":9,"totalsize":9}
{ "index": {}}
{"chargetime":302,"filetype":10,"totalsize":10}
'
聚合
curl -XGET "http://192.168.2.58:9200/test_index1/_search" -H 'Content-Type: application/json' -d'
{
"aggregations": {
"range_data": {
"aggregations": {
"fileType": {
"aggregations": {
"totalsize": {
"sum": {
"field": "totalsize"
}
}
},
"terms": {
"field": "filetype"
}
}
},
"histogram": {
"field": "chargetime",
"interval": 300,
"min_doc_count": 1
}
}
},
"size": 0
}'
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"range_data": {
"buckets": [
{
"key": 300,
"doc_count": 11,
"fileType": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": 0,
"doc_count": 1,
"totalsize": {
"value": 0
}
},
{
"key": 1,
"doc_count": 1,
"totalsize": {
"value": 1
}
},
{
"key": 2,
"doc_count": 1,
"totalsize": {
"value": 2
}
},
{
"key": 3,
"doc_count": 1,
"totalsize": {
"value": 3
}
},
{
"key": 4,
"doc_count": 1,
"totalsize": {
"value": 4
}
},
{
"key": 5,
"doc_count": 1,
"totalsize": {
"value": 5
}
},
{
"key": 6,
"doc_count": 1,
"totalsize": {
"value": 6
}
},
{
"key": 7,
"doc_count": 1,
"totalsize": {
"value": 7
}
},
{
"key": 8,
"doc_count": 1,
"totalsize": {
"value": 8
}
},
{
"key": 9,
"doc_count": 1,
"totalsize": {
"value": 9
}
}
]
}
}
]
}
}
}
查了下es默认就显示10条数据,可以指定size解决,但是当数据不知道多少条时怎么办?
 
指定size
curl -XGET "http://192.168.2.58:9200/test_index1/_search" -H 'Content-Type: application/json' -d'
{
"aggregations": {
"range_data": {
"aggregations": {
"fileType": {
"aggregations": {
"totalsize": {
"sum": {
"field": "totalsize"
}
}
},
"terms": {
"field": "filetype",
"size": 11
}
}
},
"histogram": {
"field": "chargetime",
"interval": 300,
"min_doc_count": 1
}
}
},
"size": 0
}'
需要得到所有的聚合结果,聚合结果会有很多条
已邀请:

laoyang360 - Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net

赞同来自: zhaohao

同上个问题的回复:size设置为: 2147483647。 
ES5.X/6.X版本设置为2147483647 ,它等于2^31-1, 
是32位操作系统中最大的符号型整型常量;ES1.X 2.X版本设置为0。
 
不过如果是全量数据聚合,势必要考虑性能问题。
不建议全量。

xlp

赞同来自: zhaohao

如果是用java api 实现全量聚合,可以用partition 模拟分页的效果,分批返回结果集。如图所示,因为是测试代码,关键部分被注释掉了,见谅!比如:10w条数据,设置size为10000,可以分10个partitions,循环十次,拿到所有数据。另外我也测试了,在不知道具体数据量的情况下,就设置size为10000,分10个partitions,es会做相应的调整,每批返回不一定就是10000,比如实际数据量达到14万,那么每批返回的就不止10000了,10批的总量还是14w,数据好像并没有丢失。我测试的结果是这样,但是没有找到官方的说法。如果是使用curl的方式,每执行一次,都只返回单批数据。

zhaohao

赞同来自:

@xlp:
我用kibana的dev tools和curl的结果都一样
第一次请求
curl -XGET "http://192.168.2.58:9200/test_index1/_search" -H 'Content-Type: application/json' -d'
{
"aggregations": {
"range_data": {
"aggregations": {
"fileType": {
"aggregations": {
"totalsize": {
"sum": {
"field": "totalsize"
}
}
},
"terms": {
"field": "filetype",
"include": {
"partition": 0,
"num_partitions": 2
},
"size": 3
}
}
},
"histogram": {
"field": "chargetime",
"interval": 300,
"min_doc_count": 1
}
}
},
"size": 0
}'
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"range_data": {
"buckets": [
{
"key": 300,
"doc_count": 11,
"fileType": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 5,
"buckets": [
{
"key": 0,
"doc_count": 1,
"totalsize": {
"value": 0
}
},
{
"key": 3,
"doc_count": 1,
"totalsize": {
"value": 3
}
},
{
"key": 4,
"doc_count": 1,
"totalsize": {
"value": 4
}
}
]
}
}
]
}
}
}
第二次请求
curl -XGET "http://192.168.2.58:9200/test_index1/_search" -H 'Content-Type: application/json' -d'
{
"aggregations": {
"range_data": {
"aggregations": {
"fileType": {
"aggregations": {
"totalsize": {
"sum": {
"field": "totalsize"
}
}
},
"terms": {
"field": "filetype",
"include": {
"partition": 1,
"num_partitions": 2
},
"size": 3
}
}
},
"histogram": {
"field": "chargetime",
"interval": 300,
"min_doc_count": 1
}
}
},
"size": 0
}'
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"range_data": {
"buckets": [
{
"key": 300,
"doc_count": 11,
"fileType": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1,
"totalsize": {
"value": 1
}
},
{
"key": 2,
"doc_count": 1,
"totalsize": {
"value": 2
}
},
{
"key": 8,
"doc_count": 1,
"totalsize": {
"value": 8
}
}
]
}
}
]
}
}
}

返回的结果不会比size大啊?

要回复问题请先登录注册