不要急,总有办法的

请问一下关于ik分词器分词内容中文和数字混合时无法分出数字有没有解决办法

Elasticsearch | 作者 druke | 发布于2018年04月21日 | 阅读数:7446

例如以下查询无法匹配2018,请问有没有解决办法?
GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "恒大积分2018"
}

{
"tokens": [
{
"token": "恒",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "大",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "积分",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
}
]
}
 
 
已邀请:

laoyang360 - 《一本书讲透Elasticsearch》作者,Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net

赞同来自:

我在ES6.2.2上安装的原生elasticsearch_ik,没有扩展词典:
和你一样的检索匹配如下,
{
"tokens": [
{
"token": "恒",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "大",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "积分",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "2018",
"start_offset": 4,
"end_offset": 8,
"type": "ARABIC",
"position": 3
}
]
}
如上,并且不论是粗粒度还是细粒度都包含的。
 
更进一步如果你想完全匹配可以通过"match_phrase”短语匹配实现。
更多讨论参考:https://www.zhihu.com/question/26424192

dotNetDR_ - elasticsearch 6.x

赞同来自:

es5.5.3,没发现你这问题
{
"tokens": [
{
"token": "恒",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "大",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "积分",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "2018",
"start_offset": 4,
"end_offset": 8,
"type": "ARABIC",
"position": 3
}
]
}

yayg2008

赞同来自:

尝试下细粒度分词ik_max_word

要回复问题请先登录注册