元数据

聚合元数据——对聚合结果进行打标签

Elasticsearch • ziyou 发表了文章 • 2 个评论 • 2524 次浏览 • 2019-09-23 19:51 • 来自相关话题

背景

在我们的项目中需要对聚合后的结果进行二次的terms的聚合。实际需求就是有n个模型，需要统计每个模型在每段时间的调用次数，然后需要查询指定m个模型的调用总次数。我们要为每个模型建立一个索引，然后为每个模型查一次这段时间内的使用次数。

注：我们记录的值只能从结果中拿取。

实现过程

第一次设计

每个模型在第一次设计的时候是两个字段，【调用次数】、【错误次数】，使用以下语句：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 215,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1053,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

解析后保存：

{"@timestamp":"2019-09-23T12:23:23.333","count":0,"error":0}

这种情况可以统计每个模型在某段时间内的调用次数，使用索引名来区分每个model。但是在统计指定m个模型的时候就不行了，使用索引名来查询的时候由于是指定m个，前缀不能使用* 匹配，并且不能罗列所有m个索引来查询，就无法达到统计的效果。第一次设计因为不能在结果中记录模型ID导致不能统计指定的m个模型的数量，以失败告终。

第二次设计

既然需要在统计结果中记录模型ID，那就使用terms聚合来进行操作，先使用模型ID过滤一下数据，然后使用聚合唯一的模型ID，这样就有了模型ID。查询语句如下：

{
  "aggs": {
    "modelId": {
      "terms": {
        "field": "modelId.keyword",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h",
              "lte": "now/h",
              "format": "epoch_millis"
            }
          }
        },
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 140,
    "successful" : 140,
    "skipped" : 135,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "modelId" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

在有模型调用的时候这个方法还好用，但是在无模型调用的时候这个返回结果就如上面的一样，是不包含任何信息的，错误次数的0和模型ID都没有了。第二次设计因为在无模型调用的时候导致模型ID不能记录，然后也是不能实现指定m个模型的查询次数统计，也以失败告终。

第三次设计

经过两次失败的实际案例，我发现现有的知识已经不能满足需求了，我需要新的方法，我需要一个能给查询结果添加字段的方法，所以我去查询官方文档，让我找到了这个方法聚合元数据也就是对聚合结果进行打标签。我使用第一次的设计方案，然后添加上对聚合结果打标签的方法，就可以记录一次统计值的模型ID了。查询语句如下：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      },
      "meta": {
        "modelId": "modelId01"
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

查询结果为：

{
  "took" : 88,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1056,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "meta" : {
        "modelId" : "modelId01"
      },
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

至此完成了统计需求。

总结

使用聚合元数据方法，可以对聚合的结果进行打标签，可以使用在聚合结果保存后再次进行terms聚合的时候使用，或者通过标签进行各种其他查询。

聚合元数据——对聚合结果进行打标签

背景

实现过程

第一次设计

第二次设计

第三次设计

总结

聚合元数据——对聚合结果进行打标签

背景

实现过程

第一次设计

第二次设计

第三次设计

总结

话题描述

活动推荐

相关话题

1 人关注该话题