请问一下关于ik分词器分词内容中文和数字混合时无法分出数字有没有解决办法

Elasticsearch | 作者 druke | 发布于2018年04月21日 | 阅读数：8679

例如以下查询无法匹配2018，请问有没有解决办法?

GET _analyze?pretty

{

  "analyzer": "ik_smart",

  "text": "恒大积分2018"

}

{

  "tokens": [

    {

      "token": "恒",

      "start_offset": 0,

      "end_offset": 1,

      "type": "CN_CHAR",

      "position": 0

    },

    {

      "token": "大",

      "start_offset": 1,

      "end_offset": 2,

      "type": "CN_CHAR",

      "position": 1

    },

    {

      "token": "积分",

      "start_offset": 2,

      "end_offset": 4,

      "type": "CN_WORD",

      "position": 2

    }

  ]

}

3 个回复

laoyang360 - 《一本书讲透Elasticsearch》作者，Elastic认证工程师 [死磕Elasitcsearch]知识星球地址：http://t.cn/RmwM3N9；微信公众号：铭毅天下; 博客：https://elastic.blog.csdn.net

我在ES6.2.2上安装的原生elasticsearch_ik，没有扩展词典：
和你一样的检索匹配如下,

{

  "tokens": [

    {

      "token": "恒",

      "start_offset": 0,

      "end_offset": 1,

      "type": "CN_CHAR",

      "position": 0

    },

    {

      "token": "大",

      "start_offset": 1,

      "end_offset": 2,

      "type": "CN_CHAR",

      "position": 1

    },

    {

      "token": "积分",

      "start_offset": 2,

      "end_offset": 4,

      "type": "CN_WORD",

      "position": 2

    },

    {

      "token": "2018",

      "start_offset": 4,

      "end_offset": 8,

      "type": "ARABIC",

      "position": 3

    }

  ]

}

如上，并且不论是粗粒度还是细粒度都包含的。

更进一步如果你想完全匹配可以通过"match_phrase”短语匹配实现。
更多讨论参考：https://www.zhihu.com/question/26424192

dotNetDR_ - elasticsearch 6.x

es5.5.3，没发现你这问题

{

  "tokens": [

    {

      "token": "恒",

      "start_offset": 0,

      "end_offset": 1,

      "type": "CN_CHAR",

      "position": 0

    },

    {

      "token": "大",

      "start_offset": 1,

      "end_offset": 2,

      "type": "CN_CHAR",

      "position": 1

    },

    {

      "token": "积分",

      "start_offset": 2,

      "end_offset": 4,

      "type": "CN_WORD",

      "position": 2

    },

    {

      "token": "2018",

      "start_offset": 4,

      "end_offset": 8,

      "type": "ARABIC",

      "position": 3

    }

  ]

}

yayg2008

尝试下细粒度分词ik_max_word

要回复问题请先登录或注册

请问一下关于ik分词器分词内容中文和数字混合时无法分出数字有没有解决办法

3 个回复

发起人

活动推荐

相关问题

问题状态

请问一下关于ik分词器分词内容中文和数字混合时无法分出数字有没有解决办法

与内容相关的链接

3 个回复

发起人

活动推荐

相关问题

问题状态