绊脚石乃是进身之阶。

关于分组后获取桶总数的问题

Elasticsearch | 作者 YangLingQiang | 发布于2021年09月27日 | 阅读数:1278

这是我的数据源。
[
{
"buyer_id_std" : 35444337,
"seller_id_std" : 3587575,
"date" : "2021-02-24T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 75848633,
"date" : "2021-01-20T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 25308769,
"date" : "2021-01-19T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 25308769,
"date" : "2021-01-19T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 74256954,
"date" : "2021-01-19T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 65945090,
"date" : "2021-01-20T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 74256954
"date" : "2021-01-19T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 2374066,
"date" : "2021-01-20T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 64505236,
"date" : "2021-01-19T00:00:00"
},
{
"buyer_id_std" : 35444337,
"seller_id_std" : 78976932,
"date" : "2021-01-19T00:00:00"
}
]
这个是我的查询语句

GET trade/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"buyer_id_std": 35444337
}
},
{
"range": {
"date": {
"gte": "2020-09-06T00:00:00Z",
"lte": "2021-09-06T00:00:00Z"
}
}
}
]
}
},
"aggs": {
"group_by_partner": {
"terms": {
"field": "seller_id_std",
"size": 200000
},
"aggs": {
"min_date": {
"min": {
"field": "date"
}
},
"max_date": {
"max": {
"field": "date"
}
},
"mdate": {
"bucket_selector": {
"buckets_path": {"md": "max_date"},
"script": "params.md > 1601164800000L"
}
},
"bucket_field": {
"bucket_sort": {
"from": 0,
"size": 2
}
}
}
},
"sum_partner": {
"stats_bucket": {
"buckets_path": "group_by_partner>_count"
}
}
},
"size": 0,
"track_total_hits": true
}
这个语句查询的结果是
{
"took" : 46,
"timed_out" : false,
"_shards" : {
"total" : 61,
"successful" : 61,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_partner" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 74256954,
"doc_count" : 155,
"max_date" : {
"value" : 1.6308864E12,
"value_as_string" : "2021-09-06T00:00:00.000Z"
},
"min_date" : {
"value" : 1.6020288E12,
"value_as_string" : "2020-10-07T00:00:00.000Z"
},
"count" : {
"value" : 1
}
},
{
"key" : 2374066,
"doc_count" : 108,
"max_date" : {
"value" : 1.6308E12,
"value_as_string" : "2021-09-05T00:00:00.000Z"
},
"min_date" : {
"value" : 1.6073856E12,
"value_as_string" : "2020-12-08T00:00:00.000Z"
}
}
]
},
"sum_partner" : {
"count" : 2,
"min" : 108.0,
"max" : 155.0,
"avg" : 131.5,
"sum" : 263.0
}
}
}
我想要的结果是能统计到桶的个数,如上结果,我需要的桶的总个数是8个而不是2个。
这里不能使用 
"count": {"cardinality": {"field": "
seller_id_std"}},
因为这里统计的是总的桶数,而我需要的是桶过滤后的总数, 也就是下面这段代码过滤后的总数
"mdate": {
"bucket_selector": {
"buckets_path": {"md": "max_date"},
"script": "params.md > 1601164800000L"
}
}

请求各位大佬出出主意
已邀请:

BetterLevi

赞同来自:

bucket size 设值大一些
    "bucket_field": {
"bucket_sort": {
"from": 0,
"size": 20
}
}

laoyang360 - 《一本书讲透Elasticsearch》作者,Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net

赞同来自:

首先我认知前提是聚合是不精准的。
https://mp.weixin.qq.com/s/V4cGqvkQ7-DgeSvPSketgQ

联系结合业务需求调大分桶值。分桶是输入型参数,需要你提前设置。

要回复问题请先登录注册