拼音analysis如下(只截取了需要的部分):
docName字段定义如下:
分词结果如下:
构建查询如下:
结果中没有上面分词的那条数据“大黑牛2018套餐”,但通过搜索"taocan"或者“套餐”都是可以搜到的。
搜索"dhn"、“2018”都能搜到,但“tc”就不行。。没有配置过各类词表。
想知道为啥搜“tc”就出不来结果,各位大佬可以帮忙看看问题出在哪吗?
"filter": {
"pinyin_simple_filter":{
"type" : "pinyin",
"keep_first_letter":true,
"keep_separate_first_letter" : false,
"keep_full_pinyin" : false,
"keep_original" : false,
"limit_first_letter_length" : 50,
"lowercase" : true
},
"pinyin_full_filter":{
"type" : "pinyin",
"keep_first_letter":false,
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"none_chinese_pinyin_tokenize":true,
"keep_original" : false,
"limit_first_letter_length" : 50,
"lowercase" : true
}
},
"analyzer": {
"pinyiSimpleSearchAnalyzer":{
"tokenizer" : "ik_max_word",
"filter": ["pinyin_simple_filter", "lowercase"]
},
"pinyiFullSearchAnalyzer":{
"tokenizer" : "ik_max_word",
"filter": ["pinyin_full_filter", "lowercase"]
}
}
docName字段定义如下:
"docName" : {
"type": "text",
"analyzer": "k_analyzer",
"search_analyzer": "k2_analyzer",
"fields": {
"f_pinyin":{
"type": "text",
"analyzer": "pinyiFullSearchAnalyzer",
"search_analyzer": "pinyiFullSearchAnalyzer"
},
"s_pinyin":{
"type": "text",
"analyzer": "pinyiSimpleSearchAnalyzer",
"search_analyzer": "pinyiSimpleSearchAnalyzer"
},
"std":{
"type": "text",
"analyzer": "std_analyzer",
"search_analyzer": "std2_analyzer"
}
}
},
分词结果如下:
curl -H "Content-Type: application/json" -XGET 'http://localhost:9200/pinyin_index/_analyze?pretty=true' -d '{
"analyzer":"pinyiSimpleSearchAnalyzer",
"text":"大黑牛2018套餐"
}'
{
"tokens" : [
{
"token" : "d",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "h",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "n",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "2018",
"start_offset" : 3,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 3
},
{
"token" : "tc",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "t",
"start_offset" : 7,
"end_offset" : 8,
"type" : "COUNT",
"position" : 5
},
{
"token" : "c",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 6
}
]
}
构建查询如下:
curl -H "Content-Type: application/json" -XGET 'localhost:9200/pinyin_index/_search?pretty' -d '
{
"query": {
"query_string": {
"query": "tc",
"default_field": "docName.s_pinyin",
"default_operator":"AND"
}
},
"size": 100,
"highlight": {
"pre_tags": ["<h1>"],
"post_tags": ["</h1>"],
"fields": {
"docName.s_pinyin": {}
}
}
}'
结果中没有上面分词的那条数据“大黑牛2018套餐”,但通过搜索"taocan"或者“套餐”都是可以搜到的。
搜索"dhn"、“2018”都能搜到,但“tc”就不行。。没有配置过各类词表。
想知道为啥搜“tc”就出不来结果,各位大佬可以帮忙看看问题出在哪吗?
1 个回复
medcl - 今晚打老虎。
赞同来自: