例如:
搜索“fuqiang”, “付强”,"腹腔","富强" 可以搜索到 这个可以接受
搜索: “付强”, "腹腔","富强" 可以搜索到 这个不能接受
setting和mapping怎么写可以实现 搜索: “付强”, 排除 "腹腔","富强" 呢?
说明: 使用ES 7.3.2, 并且已经安装了ik 和pinyin 插件
setting 定义
mapping定义
数据量较大 只用到了全拼
我现在的理解是 新建索引的时候 tt_ik pname字段新建了拼音的倒排索引,搜索的时候输入了 “付强” 搜索分析器tt_ik_search 也引入了拼音 filter 搜索“付强”->"fuqiang" 数据量100w左右
各位大神 望指导一哈小弟
部分mapping:
"pname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "tt_ik",
"search_analyzer" : "tt_ik_search"
},
"productLayout" : {
"type" : "keyword"
}
部分setting:
"settings" : {
"index" : {
"number_of_shards" : "1",
"blocks" : {
"read_only_allow_delete" : "false"
},
"max_result_window" : "1000000",
"analysis" : {
"analyzer" : {
"tt_ik" : {
"filter" : [
"lowercase","full_pinyin"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "ik_max_word"
},
"tt_ik_search" : {
"filter" : [
"lowercase","full_pinyin"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "ik_smart"
}
},
"filter": {
"full_pinyin" : {
"keep_joined_full_pinyin" : "true",
"keep_none_chinese_in_first_letter" : "false",
"lowercase" : "true",
"none_chinese_pinyin_tokenize" : "false",
"keep_none_chinese_in_joined_full_pinyin" : "true",
"keep_original" : "true",
"keep_first_letter" : "false",
"keep_separate_first_letter" : "false",
"type" : "pinyin",
"keep_none_chinese" : "true",
"limit_first_letter_length" : "16",
"keep_full_pinyin" : "false"
}
}
}
搜索“fuqiang”, “付强”,"腹腔","富强" 可以搜索到 这个可以接受
搜索: “付强”, "腹腔","富强" 可以搜索到 这个不能接受
setting和mapping怎么写可以实现 搜索: “付强”, 排除 "腹腔","富强" 呢?
说明: 使用ES 7.3.2, 并且已经安装了ik 和pinyin 插件
setting 定义
mapping定义
数据量较大 只用到了全拼
我现在的理解是 新建索引的时候 tt_ik pname字段新建了拼音的倒排索引,搜索的时候输入了 “付强” 搜索分析器tt_ik_search 也引入了拼音 filter 搜索“付强”->"fuqiang" 数据量100w左右
各位大神 望指导一哈小弟
部分mapping:
"pname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "tt_ik",
"search_analyzer" : "tt_ik_search"
},
"productLayout" : {
"type" : "keyword"
}
部分setting:
"settings" : {
"index" : {
"number_of_shards" : "1",
"blocks" : {
"read_only_allow_delete" : "false"
},
"max_result_window" : "1000000",
"analysis" : {
"analyzer" : {
"tt_ik" : {
"filter" : [
"lowercase","full_pinyin"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "ik_max_word"
},
"tt_ik_search" : {
"filter" : [
"lowercase","full_pinyin"
],
"char_filter" : [
"html_strip"
],
"tokenizer" : "ik_smart"
}
},
"filter": {
"full_pinyin" : {
"keep_joined_full_pinyin" : "true",
"keep_none_chinese_in_first_letter" : "false",
"lowercase" : "true",
"none_chinese_pinyin_tokenize" : "false",
"keep_none_chinese_in_joined_full_pinyin" : "true",
"keep_original" : "true",
"keep_first_letter" : "false",
"keep_separate_first_letter" : "false",
"type" : "pinyin",
"keep_none_chinese" : "true",
"limit_first_letter_length" : "16",
"keep_full_pinyin" : "false"
}
}
}
2 个回复
tongchuan1992 - 学无止境、学以致用
赞同来自:
yuechen323 - 晨儿哥
赞同来自:
对于这种同音词的搜索, 如果消除这种不确定性, 需要引入更多的信息
你现在的信息就是用户搜索的字直接搜es, 因此是无法区分的, 想想如何再引入更多的信息来消除不确定性
比如:
根据搜索的词做类目的预测, 则是电商搜索的必备技巧, 其实就是增加命中某类文档的可能性
用户搜fuqiang点击哪类文档多, 让他排名靠前
总之, 一定要引入更多信息