不要急,总有办法的

date_histogram聚合中设置interval为多天时起始日期如何确定?

Elasticsearch | 作者 guoxiaoguo | 发布于2018年06月01日 | 阅读数:5552

查询语句如下:   数据的范围限定在两周内, 想把每七天的数据分到一个桶。但是结果却出现了三个桶。
{
  "size": 0,
  "query": {
    "range": {
      "timestamp": {
        "time_zone": "+08:00",
        "gte": "now-13d/d",
        "lt": "now/d"
      }
    }
  },
  "aggs": {
    "secondAggs": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "7d",
        "time_zone": "+08:00"
      }
    }
  }
}
 
result:
 "aggregations" : {
    "secondAggs" : {
      "buckets" : [
        {
          "key_as_string" : "2018-05-17T00:00:00.000+08:00",
          "key" : 1526486400000,
          "doc_count" : 1035155
        },
        {
          "key_as_string" : "2018-05-24T00:00:00.000+08:00",
          "key" : 1527091200000,
          "doc_count" : 1370881
        },
        {
          "key_as_string" : "2018-05-31T00:00:00.000+08:00",
          "key" : 1527696000000,
          "doc_count" : 198188
        }
      ]
    }
  }
实际上数据范围是 2018-05-19 00:00:00 ~ 2018-06-01 00:00:00, 分两个桶不是正好吗, 为什么会出现三个桶? 
已邀请:

strglee

赞同来自: guoxiaoguo

是这样的 
 
https://www.elastic.co/guide/e ... nse_3  
 
A multi-bucket aggregation similar to the histogram except it can only be applied on date values. Since dates are represented in Elasticsearch internally as long values, it is possible to use the normal histogram on dates as well, though accuracy will be compromised. The reason for this is in the fact that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason, we need special support for time based data. From a functionality perspective, this histogram supports the same features as the normal histogram. The main difference is that the interval can be specified by date/time expressions.

文档开始解释了,es存储时间其实是存储的long int也就是时间戳 
 
那按照什么方式计算bucket_key呢?
https://www.elastic.co/guide/e ... .html  
 
A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values.
 
计算es桶聚合bucket_key的公式是
bucket_key = Math.floor((value - offset) / interval) * interval + offset

2018-05-19 00:00:00 的时间戳是 1526659200 带入公式
math.floor(1526659200/(60*60*24*7)) * (60*60*24*7) = 1526515200
1526515200 的代表的时间是 2018-05-17 00:00:00
 

zhongkouwei

赞同来自:

大神,请问这种情况怎么解决呢

要回复问题请先登录注册