使用 man ascii 来查看 ASCII 表。

如果搜索词要通过找出拼音,同义词,分词都匹配的结果,是不是在建索引时就得按这三种情况建索引

Elasticsearch | 作者 wengxuejie | 发布于2018年10月16日 | 阅读数:3618

公司产品想搜索词 可以通过同义词,拼音,分词结果搜索出匹配的商品信息
已邀请:

rochy - rochy_he

赞同来自: wengxuejie zz_hello

"name":{
"type":"text",
"analyzer": "ik_smart",
"fields": {
"py":{
"type":"text",
"analyzer": "pinyin"
},
"synonym":{
"type": "text",
"analyzer": "by_smart"
}
}
}

wengxuejie

赞同来自: zz_hello

最后定义成这样
"name":{
"type":"text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"py":{
"type":"text",
"analyzer": "pinyin",
"search_analyzer": "ik_smart"
},
"synonym":{
"type": "text",
"analyzer": "by_smart",
"search_analyzer": "by_smart"
}
}
},

rochy - rochy_he

赞同来自:

是的
推荐建立附加字段,例如字段为 title,
那你可以将 title 的分词器设置为正常的分词器(例如 ik),
title.pinyin 的分词器设置为拼音分词,
同义词的可以通过同义词过滤器进行设置

wengxuejie

赞同来自:

我现在的做法是 "fields": {
"py":{
"type":"text",
"analyzer": "pinyin"
},
"synonym":{
"type": "text",
"analyzer": "by_smart"
}
},建了一个域,但是目前有个问题,是分词后,同义词找不到了,是不是还的维护一下停用词库,把一些不该分词的给停用掉

wengxuejie

赞同来自:

有个问题,我GET aplum_2/_analyze
{
"analyzer": "by_smart", 
"text":"酷奇"
},能出来结果:{
"tokens": [
{
"token": "GUCCI",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "古奇",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "cucci",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "guicc",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "酷奇",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "哭泣",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
},
{
"token": "古琦",
"start_offset": 0,
"end_offset": 2,
"type": "SYNONYM",
"position": 0
}
]
},但是我用"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "酷奇",
"fields": [
"name",
"name.synonym",
"name.py",
"brand_name",
"brand_name.synonym",
"brand_name_cn"
]
}
}
]
}
},查不到任何信息,换成GUCCI就能查到,这个我一直很困惑

要回复问题请先登录注册