field相同的文档，落在不同shard，导致查询得分不同

Elasticsearch | 作者 code4j | 发布于2018年06月12日 | 阅读数：2446

查询条件如下：

{

  "from" : 0,

  "size" : 50,

  "query" : {

    "bool" : {

      "filter" : [ {

        "term" : {

          "positionState" : 2

        }

      }, {

        "term" : {

          "cityId" : 5

        }

      }, {

        "term" : {

          "positionType" : 1

        }

      } ],

      "should" : [ {

        "term" : {

          "positionClassify" : {

            "value" : 0,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 1,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 2,

            "boost" : 100.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 3,

            "boost" : 100.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 4,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 5,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 6,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 7,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 8,

            "boost" : 1.0

          }

        }

      }, {

        "term" : {

          "positionClassify" : {

            "value" : 9,

            "boost" : 1.0

          }

        }

      } ]

    }

  },

  "sort" : [ {

    "_score" : {

      "order" : "desc",

      "missing" : "_last",

      "mode" : "min"

    }

  }, {

    "releaseTime" : {

      "order" : "desc",

      "missing" : "_last",

      "mode" : "min"

    }

  }, {

    "salary" : {

      "order" : "desc",

      "missing" : "_last",

      "mode" : "min"

    }

  }, {

    "_geo_distance" : {

      "workerPlacePoint" : [ {

        "lat" : 31.2974,

        "lon" : 120.585728

      } ],

      "unit" : "km",

      "distance_type" : "plane",

      "mode" : "MIN"

    }

  }, {

    "expectServiceTime" : {

      "order" : "asc",

      "missing" : "_last",

      "mode" : "min"

    }

  } ]

}

希望positionClassify为3或4的排名靠前，positionClassify是单值，其他条件是filter不影响文档得分，理论上值为3,4的文档得分是一样的，但是结果发现钱30个值一样，后20个值不一样，他们的positionClassify都是3。

使用explain和preference后发现出现得分不一样的分界线，前30个文档来自分片0，后20个来自分片1，因为分片中的文档数不同所以tf/idf得分不同。但是我们的预期是field相同的文档得分理论上应该是一样的，因为只有这一个field是影响评分的。

目前有一个方法是只设置一个主分片，优先从主分片查询。还有别的方法吗？求大神解答

1 个回复

code4j - coder github: https://github.com/rpgmakervx

自答一下，看到之前一个朋友发的官方的文章（太靠后了还没注意过）：
https://www.elastic.co/guide/e ... .html

文章中提出两种方案。第一就是我目前采取的，小数据量放到一个shard中；另外一个就是searchtype使用dfs_query_then_fetch,这个type会先进行一次initial scatter，就是在进行真正的查询之前，先把各个分片的词频率和文档频率收集一下，搜索的时候，各分片依据全局的词频率和文档频率进行搜索和排名。

要回复问题请先登录或注册

field相同的文档，落在不同shard，导致查询得分不同

1 个回复

发起人

活动推荐

相关问题

问题状态

field相同的文档，落在不同shard，导致查询得分不同

与内容相关的链接

1 个回复

发起人

活动推荐

相关问题

问题状态