请教使用elasticsearch-analysis-pinyin分词实现suggest字段关键字自动补全问题
Elasticsearch | 作者 yang009ww | 发布于2016年12月01日 | 阅读数:10911
1. 版本说明
2. 步骤
3. Java实现suggest
如图:
如图:
针对以上2种情况:
补充一下空指针异常的错误,如下图:
现在又发现一些问题,不知道是elasticsearch-analysis-pinyin插件的问题,还是我的配置和用法不对。
结果中出现2个"liu",如图:
结果中出现3个"liu"和1个"",如图:
index_pinyin_analyzer的配置请见“步骤”的第1步代码。
请各位大神分析以下以上出现的原因及解决方案!拜谢!
- elasticsearch2.3.4
- elasticsearch-analysis-pinyin-1.8.2
- elasticsearch-analysis-ik-1.9.4
- jdk1.7
2. 步骤
- 创建index, 并为index设置pinyin分词器, 具体代码如下:
[b][b] [/b][/b]curl -XPUT http://172.16.0.29:9401/medcl/ -d'{ "index" : { "analysis" : { "analyzer" : { "index_pinyin_analyzer" : { "tokenizer" : "index_pinyin_tokenizer" }, "search_pinyin_analyzer" : { "tokenizer" : "search_pinyin_tokenizer" } }, "tokenizer" : { "index_pinyin_tokenizer" : { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "keep_joined_full_pinyin" : true, "limit_first_letter_length" : 16, "lowercase" : true }, "search_pinyin_tokenizer" : { "type" : "pinyin", "keep_separate_first_letter" : false, "keep_joined_full_pinyin" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true } } } }}'
- 创建3个字段:name, suggest, suggestName,[b]其中suggest和suggestName都是completion类型,suggest字段指定分词器为:index_pinyin_analyzer和search_pinyin_analyzer, suggestName字段指定分词器为:ik_max_word和ik_smart,具体代码如下:[/b]
curl -XPOST http://172.16.0.29:9401/medcl/folks/_mapping -d'{ "folks": { "properties": { "suggest": { "type": "completion", "analyzer": "index_pinyin_analyzer", "search_analyzer": "search_pinyin_analyzer", "payloads": true }, "suggestName": { "type": "completion", "analyzer": "ik_max_word", "search_analyzer": "ik_smart", "payloads": true }, "name": { "type": "string", "search_analyzer": "ik_smart", "analyzer": "ik_max_word" } } }}'
- 写入一条"刘德华"的测试数据,具体代码如下:
curl -XPOST http://172.16.0.29:9401/medcl/folks/ -d'{"name":"刘德华", "suggest" : "刘德华", "suggestName" : "刘德华"}'
3. Java实现suggest
- 针对suggest字段(采用的pinyin分词),具体代码如下:
public static List<String> suggest(String keyword) { CompletionSuggestionBuilder suggest = new CompletionSuggestionBuilder("suggest").field("suggest").text(keyword).size(10); SearchRequestBuilder request = client.prepareSearch("medcl").setTypes("folks").addSuggestion(suggest); SearchResponse response = request.get(); List<String> words = Lists.newArrayList(); for( Entry< extends Option> entry : response.getSuggest().getSuggestion("suggest").getEntries()) { for( Option option : entry.getOptions()) { words.add(String.valueOf(option.getText())); } } return words; } public static void main(String args) { String keywords = {"刘德华", "liudehua"}; for (String keyword : keywords) { List<String> words = suggest(keyword); System.out.print("关键词[" + keyword + "]"); if (words.isEmpty()) { System.out.println("无匹配结果~"); } else { System.out.println("联想词分别为:" + words.toString()); } }
- 当keyword为"刘德华"时,无结果
- 当keyword为"liudehua"时,response.getSuggest().getSuggestion("suggest").getEntries()出现空指针异常
如图:
- 针对suggestName字段(采用ik分词),具体代码如下:(与上段代码唯一的区别就是suggest字段由suggest变为suggestName,其它无任何差异)
public static List<String> suggest(String keyword) { CompletionSuggestionBuilder suggest = new CompletionSuggestionBuilder("suggest").field("suggestName").text(keyword).size(10); SearchRequestBuilder request = client.prepareSearch("medcl").setTypes("folks").addSuggestion(suggest); SearchResponse response = request.get(); List<String> words = Lists.newArrayList(); for( Entry< extends Option> entry : response.getSuggest().getSuggestion("suggest").getEntries()) { for( Option option : entry.getOptions()) { words.add(String.valueOf(option.getText())); } } return words; } public static void main(String args) { String keywords = {"刘德华", "liudehua"}; for (String keyword : keywords) { List<String> words = suggest(keyword); System.out.print("关键词[" + keyword + "]"); if (words.isEmpty()) { System.out.println("无匹配结果~"); } else { System.out.println("联想词分别为:" + words.toString()); } } }
- 当keyword为"刘德华"时,匹配到"刘德华"的结果
- 当keyword为"liudehua"时,无结果
如图:
针对以上2种情况:
- 第1种出乎意料,正常情况下无论输入"刘德华"还是"liudehua",都是应该能够匹配到结果的
- 第2种正常,因为ik只做中文分词,不支持拼音
补充一下空指针异常的错误,如下图:
现在又发现一些问题,不知道是elasticsearch-analysis-pinyin插件的问题,还是我的配置和用法不对。
- 使用elasticsearch-analysis-pinyin对中文"刘"进行分词,代码:
http://172.16.0.29:9401/medcl/ ... Dtrue
结果中出现2个"liu",如图:
- 使用elasticsearch-analysis-pinyin对拼音"liu"进行分词,代码:
[url=http://172.16.0.29:9401/medcl/_analyzetext=liu&analyzer=index_pinyin_analyzer&pretty=true]http://172.16.0.29:9401/medcl/ ... Dtrue[/url]
结果中出现3个"liu"和1个"",如图:
index_pinyin_analyzer的配置请见“步骤”的第1步代码。
请各位大神分析以下以上出现的原因及解决方案!拜谢!
2 个回复
ansj - hi i am i
赞同来自:
medcl - 今晚打老虎。
赞同来自: