我有点怀疑你在刷屏

请问基于es数组进行query_string查询时,如何在highlight中返回最佳匹配的数据?

Elasticsearch | 作者 osmondy | 发布于2024年01月15日 | 阅读数:3111

ES版本7.6。
整体数据量有3亿,明细数据可能会更多,所以我将商标明细信息 trademarkName 放在了数组中,担心使用nested会影响到查询性能。用的是ik_smart分词器。
 
我的DSL查询语句如下:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"query_string": {
"query": "(十三易)",
"fields": [
"trademarkName^8.5"
],
"type": "phrase",
"default_operator": "and",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 10,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
"functions": [
{
"filter": {
"query_string": {
"query": "(十三易)",
"fields": [
"trademarkName^8.5"
],
"type": "phrase",
"default_operator": "and",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
},
"weight": 280
},
{
"filter": {
"match_all": {
"boost": 1
}
},
"weight": 0.01,
"field_value_factor": {
"field": "entTypeScore",
"factor": 1,
"missing": 1,
"modifier": "square"
}
},
{
"filter": {
"match_all": {
"boost": 1
}
},
"weight": 15,
"field_value_factor": {
"field": "regCapUnify",
"factor": 1,
"missing": 0,
"modifier": "ln2p"
}
}
],
"score_mode": "sum",
"boost_mode": "multiply",
"max_boost": 100000,
"boost": 1
}
}
],
"filter": [
{
"bool": {
"must": [
{
"bool": {
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": ["trademarkName"],
"highlight": {
"order": "score",
"fields": {
"trademarkName": {
"fragment_size": 80,
"number_of_fragments": 3
}
}
}
}
匹配的结果如下:
{
"max_score" : 79977.836,
"hits" : [
{
"_index" : "company_v6",
"_type" : "_doc",
"_id" : "ab2d254d9ca99ab4a13170d7016d7a85",
"_score" : 79977.836,
"_source" : {
"trademarkName" : [
"易十三",
"十三属相",
"图形",
"名洋数字",
"智会智展",
"十三易"
]
},
"highlight" : {
"trademarkName" : [
"<em>易</em><em>十三</em>",
"<em>十三</em><em>易</em>",
"<em>十三</em>属相"
]
}
}
]
}
请问基于数组怎样才能将 trademarkName 中最匹配的 十三易 优先返回?是不是需要自定义分词器讷?
已邀请:

osmondy - 瞎折腾

赞同来自:

我把 function_score 去掉后,居然能匹配到了。不太清除为什么会这样,希望有知道的大神能够解答下,万分感谢🌹
微信图片_20240116171828.png

 

Ombres

赞同来自:

前后两次都含有phrase检索,但是稍有差异,可以看看  "phrase_slop": 10  这个参数,第一次请求中slop=10应该是为了扩大命中范围,所以设置了查询较低的权重。但是高亮的时候没有有提升权重这个概念,都是参照原始的匹配度进行标红的。
 
可以考虑业务代码层面后置做权重,亦或者自己实现标红逻辑
 
 

要回复问题请先登录注册