父子文档，怎么比较两个子文档中的字段

Elasticsearch | 作者 zx3271234 | 发布于2017年11月30日 | 阅读数：3742

假设学生和考试设置为父子文档关系。
每一个学生有多次考试，怎么查询 2017-12-31考试成绩比2017-09-30的分数高的所有学生？
比如在下面的例子中，学生AAA分数50->80，满足条件；而学生BBB分数90->40，不满足条件。
PUT /idx
{
"mappings": {
    "student": {},
    "exam": {
      "_parent": {
        "type": "student"
      }
    }
}
}

POST /idx/student/_bulk
{ "index": { "_id": "1" }}
{ "name": "AAA"}
{ "index": { "_id": "2" }}
{ "name": "BBB"}

POST /idx/exam/_bulk
{ "index": { "_id": 1, "parent": "1" }}
{ "date":"2017-09-30", "score":50}
{ "index": { "_id": 2, "parent": "1" }}
{ "date":"2017-12-31", "score":80}
{ "index": { "_id": 3, "parent": "2" }}
{ "date":"2017-09-30", "score":90}
{ "index": { "_id": 4, "parent": "2" }}
{ "date":"2017-12-31", "score":40}

2 个回复

kennywu76 - Wood

赞同来自: venyowang 、zx3271234

这种过滤逻辑用ES不太好实现，琢磨了半天，想出下面这个比较接近需求的实现供参考:

POST idx/_search?size=0

{

  "aggs": {

    "student": {

      "terms": {

        "field": "name.keyword",

        "size": 99999

      },

      "aggs": {

        "scores": {

          "children": {

            "type": "exam"

          },

          "aggs": {

            "is_score_higher": {

              "scripted_metric": {

                "init_script": "params._agg.scores= new HashMap(); params._agg.df=new SimpleDateFormat('yyyy-MM-dd')",

                "map_script": "params._agg.scores[params._agg.df.format(doc.date.value)]=doc.score.value",

                "combine_script": "return params._agg.scores",

                "reduce_script": "def a,b; for (scores in params._aggs) {a=scores['2017-12-31']; b=scores['2017-09-30']} return a>b"

              }

            }

          }

        }

      }

    }

  }

}

查询结果:

{

  "took": 3,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "skipped": 0,

    "failed": 0

  },

  "hits": {

    "total": 6,

    "max_score": 0,

    "hits": 

  },

  "aggregations": {

    "student": {

      "doc_count_error_upper_bound": 0,

      "sum_other_doc_count": 0,

      "buckets": [

        {

          "key": "AAA",

          "doc_count": 1,

          "scores": {

            "doc_count": 2,

            "is_score_higher": {

              "value": true

            }

          }

        },

        {

          "key": "BBB",

          "doc_count": 1,

          "scores": {

            "doc_count": 2,

            "is_score_higher": {

              "value": false

            }

          }

        }

      ]

    }

  }

}

解释一下这个查询的含义:

最外层用父文档的name.keyword字段做terms aggregation，以便按照name来生成文档的分桶。
在每个分桶里，做子文档score的聚合，这里使用了scripted_metric aggregation。其计算过程
- init_script里初始化一些全局变量，params._agg.scores设置为一个hash map，用来存放 date -> score的映射。params._agg.df用于后面将日期字段转换为字符串表现形式。
- map_script针对每个文档单独执行的，这里将日期字段转换为字符串形式，做为key存放在scores这个map里，对应的分数做为值存放。
- combine_script用于产生每个shard的聚合结果，这里将每个shard生成好的scores map返回
- reduce_script是针对所有shard返回的数据做全局处理。combine阶段返回的数据全部保存在params._aggs这个数组变量里，所以需要遍历这个数组，生成全局的date -> score的映射。最后判断是否某个日期的score是否大于另外一个日期，并据此返回true|false

最后的聚合结果里， is_score_highter的值是true的，表明对应的key满足查询条件。应用端基于这个值过滤一下就可以了。本来想利用bucket selector aggreation，将聚合结果直接再过滤一下，但实验了一下发现其不支持对scripted_metric的处理，所以我也只能做到这一步了。

另外reduce script写得比较粗糙，使用某个key之前没有判断key是否存在，你可以自己完善一下。

zx3271234

感谢kennywu76耐心的解答。但是我现在对聚合还不太熟悉，所以打算先用查询来解决。
我现在的方案是:
把考试的分数转换为文档的评分，2017-12-31日的取正，2017-09-30的取负，其他的过滤掉，然后把所有的文档评分相加得到父文档的评分。父文档评分>0的就满足要求。

还想请问一下，怎么在查询父文档的script中访问子文档的字段？

GET idx/student/_search

{

  "min_score": 0,

  "query": {

    "has_child": {

      "type": "exam",

      "score_mode": "sum",

      "query": {

        "function_score": {

          "query": {

            "terms": {

              "date": [

                "2017-09-30",

                "2017-12-31"

              ]

            }

          },

          "functions": [

            {

              "filter": {

                "term": {

                  "date": "2017-12-31"

                }

              },

              "script_score": {

                "script": "doc['score'].value"

              }

            },

            {

              "filter": {

                "term": {

                  "date": "2017-09-30"

                }

              },

              "script_score": {

                "script": "-doc['score'].value"

              }

            }

          ]

        }

      }

    }

  }

}

要回复问题请先登录或注册

父子文档，怎么比较两个子文档中的字段

2 个回复

发起人

活动推荐

相关问题

问题状态

父子文档，怎么比较两个子文档中的字段

与内容相关的链接

2 个回复

发起人

活动推荐

相关问题

问题状态