使用 man ascii 来查看 ASCII 表。

请教使用elasticsearch-analysis-pinyin分词实现suggest字段关键字自动补全问题

Elasticsearch | 作者 yang009ww | 发布于2016年12月01日 | 阅读数:10876

1. 版本说明
  • elasticsearch2.3.4 
  • elasticsearch-analysis-pinyin-1.8.2 
  • elasticsearch-analysis-ik-1.9.4
  • jdk1.7

 2. 步骤
  •  创建index, 并为index设置pinyin分词器, 具体代码如下:
    curl -XPUT http://172.16.0.29:9401/medcl/ -d'{    "index" : {        "analysis" : {            "analyzer" : {                "index_pinyin_analyzer" : {                    "tokenizer" : "index_pinyin_tokenizer"                    },					"search_pinyin_analyzer" : {						"tokenizer" : "search_pinyin_tokenizer"                    }            },            "tokenizer" : {                "index_pinyin_tokenizer" : {                    "type" : "pinyin",                    "keep_separate_first_letter" : true,                    "keep_full_pinyin" : true,                    "keep_original" : true,					"keep_joined_full_pinyin" : true,                    "limit_first_letter_length" : 16,                    "lowercase" : true                },				"search_pinyin_tokenizer" : {                    "type" : "pinyin",                    "keep_separate_first_letter" : false,					"keep_joined_full_pinyin" : true,                    "keep_full_pinyin" : true,                    "keep_original" : true,                    "limit_first_letter_length" : 16,                    "lowercase" : true                }            }        }    }}'
    [b][b]
    index-mapping.jpg
    [/b][/b]

 
  • 创建3个字段:name, suggest, suggestName,[b]其中suggest和suggestName都是completion类型,suggest字段指定分词器为:index_pinyin_analyzer和search_pinyin_analyzer, suggestName字段指定分词器为:ik_max_word和ik_smart,具体代码如下:[/b]
  • curl -XPOST http://172.16.0.29:9401/medcl/folks/_mapping -d'{    "folks": {        "properties": {            "suggest": {			  "type": "completion",			  "analyzer": "index_pinyin_analyzer",			  "search_analyzer": "search_pinyin_analyzer",			  "payloads": true			},			"suggestName": {			  "type": "completion",			  "analyzer": "ik_max_word",			  "search_analyzer": "ik_smart",			  "payloads": true			},			"name": {			  "type": "string",			  "search_analyzer": "ik_smart",			  "analyzer": "ik_max_word"			}        }    }}'
    field-mapping.jpg

  • 写入一条"刘德华"的测试数据,具体代码如下:
    curl -XPOST http://172.16.0.29:9401/medcl/folks/ -d'{"name":"刘德华", "suggest" : "刘德华", "suggestName" : "刘德华"}'

   
value.jpg

3. Java实现suggest
  • 针对suggest字段(采用的pinyin分词),具体代码如下:
    public static List<String> suggest(String keyword) {		CompletionSuggestionBuilder suggest = new CompletionSuggestionBuilder("suggest").field("suggest").text(keyword).size(10);		SearchRequestBuilder request = client.prepareSearch("medcl").setTypes("folks").addSuggestion(suggest);		SearchResponse response = request.get();		List<String> words = Lists.newArrayList();		for( Entry< extends Option> entry : response.getSuggest().getSuggestion("suggest").getEntries()) {		     for( Option option : entry.getOptions()) {		          words.add(String.valueOf(option.getText()));		     }		}		return words;	}		public static void main(String args) {		String keywords = {"刘德华", "liudehua"};		for (String keyword : keywords) {			List<String> words = suggest(keyword);			System.out.print("关键词[" + keyword + "]");			if (words.isEmpty()) {				System.out.println("无匹配结果~");			} else {				System.out.println("联想词分别为:" + words.toString());			}		}	
  • 当keyword为"刘德华"时,无结果
  • 当keyword为"liudehua"时,response.getSuggest().getSuggestion("suggest").getEntries()出现空指针异常

    如图:
exp.jpg

  • 针对suggestName字段(采用ik分词),具体代码如下:(与上段代码唯一的区别就是suggest字段由suggest变为suggestName,其它无任何差异)
    public static List<String> suggest(String keyword) {		CompletionSuggestionBuilder suggest = new CompletionSuggestionBuilder("suggest").field("suggestName").text(keyword).size(10);		SearchRequestBuilder request = client.prepareSearch("medcl").setTypes("folks").addSuggestion(suggest);		SearchResponse response = request.get();		List<String> words = Lists.newArrayList();		for( Entry< extends Option> entry : response.getSuggest().getSuggestion("suggest").getEntries()) {		     for( Option option : entry.getOptions()) {		          words.add(String.valueOf(option.getText()));		     }		}		return words;	}		public static void main(String args) {		String keywords = {"刘德华", "liudehua"};		for (String keyword : keywords) {			List<String> words = suggest(keyword);			System.out.print("关键词[" + keyword + "]");			if (words.isEmpty()) {				System.out.println("无匹配结果~");			} else {				System.out.println("联想词分别为:" + words.toString());			}		}	}
  • 当keyword为"刘德华"时,匹配到"刘德华"的结果
  • 当keyword为"liudehua"时,无结果

  如图:
results.jpg

 
  针对以上2种情况:
  • 第1种出乎意料,正常情况下无论输入"刘德华"还是"liudehua",都是应该能够匹配到结果的
  • 第2种正常,因为ik只做中文分词,不支持拼音

 
  补充一下空指针异常的错误,如下图:
  
response-exp.jpg
 
 
  现在又发现一些问题,不知道是elasticsearch-analysis-pinyin插件的问题,还是我的配置和用法不对。

   结果中出现2个"liu",如图:
   
liu-fenci.jpg


   结果中出现3个"liu"和1个"",如图:
   
liu-fenci2.jpg

 
   index_pinyin_analyzer的配置请见“步骤”的第1步代码。
   
   请各位大神分析以下以上出现的原因及解决方案!拜谢!
已邀请:

ansj - hi i am i

赞同来自:

这都2019年了。看上去是分词插件出了问题呀

medcl - 今晚打老虎。

赞同来自:

老帖复活了啊,还是升级版本吧

要回复问题请先登录注册