有个人长的像洋葱,走着走着就哭了…….

pinyin分词如何可以切分歧义字段如 xian, changan

Elasticsearch | 作者 caojunjie | 发布于2019年10月11日 | 阅读数:1527

如何可以让 拼音搜索 xian 的时候同时返回 xian和xi+an的结果?而且我是使用match_phrase作为匹配?
已邀请:

trycatchfinal

赞同来自:

你是用的pinyin分词器吗?
可以配置分词器参数keep_joined_full_pinyin为true,这样索引数据的时候,“西安”的分词结果就包含"xian"
 
POST _analyze
{
"tokenizer": {
"type" : "pinyin",
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"keep_original" : true,
"limit_first_letter_length" : 16,
"lowercase" : true,
"keep_joined_full_pinyin":true,
"remove_duplicated_term" : true
},
"text": "西安"
}
{
"tokens" : [
{
"token" : "xi",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "西安",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "xian",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "xa",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "an",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 1
}
]
}

要回复问题请先登录注册