要不要再翻翻文档呢?

相似度算法中文档长度和文档平均长度是怎么计算的

Elasticsearch | 作者 stephen_qu | 发布于2020年01月07日 | 阅读数:2056

(freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength))
fieldLength :文档长度
avgFieldLength:文档平均长度
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6797494,
"hits": [
{
"_shard": "[books][0]",
"_node": "NzAqveA6T_2lk23cHnAgdA",
"_index": "books",
"_type": "IT",
"_id": "2",
"_score": 0.6797494,
"_source": {
"id": "2",
"description": "让你的Java程序更快、更稳定。深入剖析软件设计层面、代码层面、JVM虚拟机层面的优化方法"
},
"_explanation": {
"value": 0.6797494,
"description": "weight(description:java in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.6797494,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.98082924,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1.0,
"description": "docFreq", // 这条数据对应的分片下,符合搜索条件的doc数目有多少
"details": []
},
{
"value": 3.0,
"description": "docCount", // 数据对应的分片下的文档总数
"details": []
}
]
},
{
"value": 0.6930354,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1.0,
"description": "termFreq=1.0", // 搜索词在该字段中出现的次数, 如果是不分词字段,必须要完全匹配才会有一次。
"details": []
},
{
"value": 1.2,
"description": "parameter k1", // 调优参数
"details": []
},
{
"value": 0.75,
"description": "parameter b", // 调优参数
"details": []
},
{
"value": 19.666666,
"description": "avgFieldLength",
"details": []
},
{
"value": 40.96,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
},
{
"_shard": "[books][2]",
"_node": "NzAqveA6T_2lk23cHnAgdA",
"_index": "books",
"_type": "IT",
"_id": "1",
"_score": 0.6618818,
"_source": {
"id": "1",
"description": "Java学习必读经典,殿堂级著作!赢得了全球程序员的广泛赞誉。"
},
"_explanation": {
"value": 0.6618818,
"description": "weight(description:java in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.6618818,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.6931472,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1.0,
"description": "docFreq",
"details": []
},
{
"value": 2.0,
"description": "docCount",
"details": []
}
]
},
{
"value": 0.9548936,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1.0,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 25.5,
"description": "avgFieldLength",
"details": []
},
{
"value": 28.444445,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
}
]
}
}
已邀请:

要回复问题请先登录注册