pinyin分词如何可以切分歧义字段如 xian, changan

Elasticsearch | 作者 caojunjie | 发布于2019年10月11日 | 阅读数：1788

如何可以让拼音搜索 xian 的时候同时返回 xian和xi+an的结果？而且我是使用match_phrase作为匹配？

1 个回复

你是用的pinyin分词器吗？
可以配置分词器参数keep_joined_full_pinyin为true，这样索引数据的时候，“西安”的分词结果就包含"xian"

POST _analyze

{

  "tokenizer": {

                    "type" : "pinyin",

                    "keep_separate_first_letter" : false,

                    "keep_full_pinyin" : true,

                    "keep_original" : true,

                    "limit_first_letter_length" : 16,

                    "lowercase" : true,

                    "keep_joined_full_pinyin":true,

                    "remove_duplicated_term" : true

                },

  "text": "西安"

}

{

  "tokens" : [

    {

      "token" : "xi",

      "start_offset" : 0,

      "end_offset" : 0,

      "type" : "word",

      "position" : 0

    },

    {

      "token" : "西安",

      "start_offset" : 0,

      "end_offset" : 0,

      "type" : "word",

      "position" : 0

    },

    {

      "token" : "xian",

      "start_offset" : 0,

      "end_offset" : 0,

      "type" : "word",

      "position" : 0

    },

    {

      "token" : "xa",

      "start_offset" : 0,

      "end_offset" : 0,

      "type" : "word",

      "position" : 0

    },

    {

      "token" : "an",

      "start_offset" : 0,

      "end_offset" : 0,

      "type" : "word",

      "position" : 1

    }

  ]

}

要回复问题请先登录或注册

pinyin分词如何可以切分歧义字段如 xian, changan

1 个回复

发起人

活动推荐

相关问题

问题状态

pinyin分词如何可以切分歧义字段如 xian, changan

与内容相关的链接

1 个回复

发起人

活动推荐

相关问题

问题状态