关于ES的评分?

作者 zev | 发布于2018年08月23日 | 阅读数:424

搜索关键词: 关于
 
第一个结果和第四个结果中‘单位’都有匹配,
但第四个结果‘标题’也有匹配,正常要得结果肯定是在前面的。
是什么原因导致它的评分降低?求解
是查询语句问题?还是设置什么参数没设置好?

2.png

 
 
GET document/_search?explain=true
{
"query": {
"dis_max" : {
"tie_breaker" : 0.4,
"queries" : [
{
"match" : {
"文件标题" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 5.0
}
}
},
{
"match" : {
"单位" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"类型" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"意见" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"来文号" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
{
"match" : {
"来文编号" : {
"query" : "关于",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"boost" : 1.0
}
}
}
 
"mappings": {

"doc": {
"properties": {
"单位": {
"analyzer": "ik_smart",
"type": "text"
},
"公文类型": {
"type": "text","fields": {
"keyword": {
"ignore_above": 256,"type": "keyword"}}},
"来文号": {
"analyzer": "ik_smart",
"type": "text"
},
"办结日期": {
"type": "long"
},
"登记日期": {
"type": "long"
},
"文件标题": {
"analyzer": "ik_smart",
"type": "text"
},
"意见": {
"analyzer": "ik_smart",
"type": "text"
},
"来文编号": {
"analyzer": "ik_smart",
"type": "text"
},
"类型": {
"analyzer": "ik_smart",
"type": "text"
}
}
}

},

 
已邀请:

medcl - Elastic 🇨🇳 !

赞同来自: derobukal

打分结果已经告诉你了,出问题的不是条件,是你的关键字,“关于”这个词在标题里面是高频词,差不多一半的标题里面都出现了“关于”,所以,即使命中了,贡献的分值也不高,因为“关于”这个词实在是普通。而在“单位”字段里面就比较少见了,所以,单位字段命中之后贡献的分值很大。

zev

赞同来自:

​这是两个结果的评分:
      {
"_shard": "[document_v2][0]",
"_node": "DmPg1E3mR3KTfzUpqVzPDQ",
"_index": "document_v2",
"_type": "doc",
"_id": "g0102_bc491024-fb6b-469d-bb3c-68b144fc988f",
"_score": 6.2364345,
"_source": {
"文件标题": "区党委",
"单位": "自治区党委转发中央组织部关于姜兴和同志退休的通知",
},
"_explanation": {
"value": 6.2364345,
"description": "max plus 0.4 times others of:",
"details": [
{
"value": 6.2364345,
"description": "weight(单位:关于 in 12443) [PerFieldSimilarity], result of:",
"details": [
{
"value": 6.2364345,
"description": "score(doc=12443,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 9.842326,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 3,
"description": "docFreq",
"details": []
},
{
"value": 65846,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.6336342,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 4.557938,
"description": "avgFieldLength",
"details": []
},
{
"value": 11,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
}
},
{
"_shard": "[document_v2][0]",
"_node": "DmPg1E3mR3KTfzUpqVzPDQ",
"_index": "document_v2",
"_type": "doc",
"_id": "g0102_038731a7-1ce2-430c-a522-4baba857ac2c",
"_score": 5.0754642,
"_source": {
"文件标题": "关于在南宁建设周氏兄弟国际艺术谷的设想",
"单位": "周红波市长在《关于南宁建设周氏兄弟国际艺术谷的设想》的批示件",
"类型": "收文",
},
"_explanation": {
"value": 5.0754642,
"description": "max plus 0.4 times others of:",
"details": [
{
"value": 1.0641516,
"description": "weight(文件标题:关于 in 16401) [PerFieldSimilarity], result of:",
"details": [
{
"value": 1.0641516,
"description": "score(doc=16401,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 5,
"description": "boost",
"details": []
},
{
"value": 0.19654904,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 87490,
"description": "docFreq",
"details": []
},
{
"value": 106492,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.0828357,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 14.760095,
"description": "avgFieldLength",
"details": []
},
{
"value": 12,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 4.6498036,
"description": "weight(单位:关于 in 16401) [PerFieldSimilarity], result of:",
"details": [
{
"value": 4.6498036,
"description": "score(doc=16401,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 9.842326,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 3,
"description": "docFreq",
"details": []
},
{
"value": 65846,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.47242934,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 4.557938,
"description": "avgFieldLength",
"details": []
},
{
"value": 17,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
}
}

yayg2008

赞同来自:

根据打分结果,仔细分析一下,就可以知道哪个打分项占的比重最大;再结合BM25的参数因子进行调整,验证。

要回复问题请先登录注册