Elasticsearch 使用IK分词，如何配置同义词？

laigood

IK没有实现同义词功能吧，这个你可以自己实现，use_smart是个参数，可以自己设置的，详情参考：https://github.com/medcl/elasticsearch-analysis-ik

dicom

设置tokenizer: ik，再使用synonym token filter可以实现同义词功能，但use_smart参数设置不起作用，不知为什么？

defineconst

https://github.com/medcl/elasticsearch-analysis-ik
没有设置同义词吧？

medcl - 今晚打老虎。

http://www.ifunit.com/29/elasticsearch配置同义词

defineconst

英文的可以分词。但是中文不行。如下：
GET /my_index2/_analyze?analyzer=ik_max_word_syno&text=cosmos
{
"tokens": [
{
"token": "universe",
"start_offset": 0,
"end_offset": 6,
"type": "SYNONYM",
"position": 1
},
{
"token": "cosmos",
"start_offset": 0,
"end_offset": 6,
"type": "SYNONYM",
"position": 1
}
]
}

GET /my_index2/_analyze?analyzer=ik_max_word_syno&text=foosball
{
"tokens": [
{
"token": "foozball",
"start_offset": 0,
"end_offset": 8,
"type": "SYNONYM",
"position": 1
},
{
"token": "foosball",
"start_offset": 0,
"end_offset": 8,
"type": "SYNONYM",
"position": 1
}
]
}

GET /index/_analyze?analyzer=ik
{
"text": "西红柿"
}
{
"tokens": [
{
"token": "text",
"start_offset": 5,
"end_offset": 9,
"type": "ENGLISH",
"position": 1
},
{
"token": "西红柿",
"start_offset": 13,
"end_offset": 16,
"type": "CN_WORD",
"position": 2
}
]
}

GET /my_index2/_analyze?analyzer=ik_max_word_syno&text=西红柿
{
"tokens": []
}

defineconst

具体参考medcl，http://www.ifunit.com/29/elasticsearch配置同义词
第一步，放置同义词词典
elasticsearch-1.6.0-self\config\analysis\synonym.txt
内容：
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
西红柿, 番茄
马铃薯, 土豆
aa,bb
第二步，配置elasticsearch.yml文件，拷贝了medcl的rtf的1.0.0版本（1.6.0基本插件和管理插件我已配置好，可以联系获取）
内容（截选）：
ik_syno:
type: custom
tokenizer: ik
filter: my_synonym
ik_max_word_syno:
type: custom
tokenizer: ik
filter: my_synonym
use_smart: false
#index.analysis.analyzer.default.type: mmseg
index.analysis.analyzer.default.type: ik

defineconst

第三步，测试分词
GET /index/_analyze?analyzer=ik
{
"text": "西红柿"
}
结果：
{
"tokens": [
{
"token": "text",
"start_offset": 5,
"end_offset": 9,
"type": "ENGLISH",
"position": 1
},
{
"token": "西红柿",
"start_offset": 13,
"end_offset": 16,
"type": "CN_WORD",
"position": 2
}
]
}

defineconst

GET /index/_analyze?analyzer=ik_max_word_syno
{
"text": "西红柿"
}
结果是：
{
"tokens": [
{
"token": "text",
"start_offset": 5,
"end_offset": 9,
"type": "ENGLISH",
"position": 1
},
{
"token": "西红柿",
"start_offset": 13,
"end_offset": 16,
"type": "SYNONYM",
"position": 2
},
{
"token": "番茄",
"start_offset": 13,
"end_offset": 16,
"type": "SYNONYM",
"position": 2
}
]
}

defineconst

由此可见，在查询语句中要加入分析器（正在理解，还没有入门），如下：
POST my_index2/fulltext/_search
{
"query": {
"query_string": {
"text": {
"query": "西红柿",
"analyzer": "ik_max_word_syno"
}
}
},
"highlight": {
"pre_tags": [
"<tag1>",
"<tag2>"
],
"post_tags": [
"</tag1>",
"</tag2>"
],
"fields": {
"content": {}
}
}
}

结果是：

{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.19070336,
"hits": [
{
"_index": "my_index2",
"_type": "fulltext",
"_id": "1",
"_score": 0.19070336,
"_source": {
"content": "USA 西红柿Elizabeth is the English queen of united states"
}
},
{
"_index": "my_index2",
"_type": "fulltext",
"_id": "3",
"_score": 0.19070336,
"_source": {
"content": "西红柿蛋汤The United States is wealthy"
}
},
{
"_index": "my_index2",
"_type": "fulltext",
"_id": "4",
"_score": 0.02250402,
"_source": {
"content": "The United States is wealthy番茄炒鸡蛋"
}
}
]
}
}

tianzhaixing - 80后IT男

## 同义词配置

### step 1

elasticserach.yml 最后一行添加：
index.analysis.analyzer.default.type: ik

### step 2

在elasticsearch-2.3.1/config目录下面，存放synonyms.txt

其中，synonyms.txt 编码格式为 utf-8，内容为：

#Example:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
西红柿, 番茄
马铃薯, 土豆
aa, bb

### step 3

新建立索引类型设置：

curl -XPUT localhost:9200/test/_mapping?pretty -d '

{

  "settings": {

    "index": {

      "analysis": {

        "analyzer": {

          "jt_cn": {

            "type": "custom",

            "use_smart": "true",

            "tokenizer": "ik_smart",

            "filter": ["jt_tfr","jt_sfr"],

            "char_filter": ["jt_cfr"]

          },

          "ik_smart": {

            "type": "ik",

            "use_smart": "true"

          },

          "ik_max_word": {

            "type": "ik",

            "use_smart": "false"

          }

        },

        "filter": {

          "jt_tfr": {

            "type": "stop",

            "stopwords": [" "]

          },

          "jt_sfr": {

            "type": "synoym",

            "synonyms_path": "synonyms.txt"

          }

        },

        "char_filter": {

            "jt_cfr": {

                "type": "mapping",

                "mappings": [

                    "| => \\|"

                ]

            }

        }

      }

    }

  },

  "mappings": {

    "solution": {

      "properties": {

        "title": {

          "include_in_all": true,

          "analyzer": "jt_cn",

          "term_vector": "with_positions_offsets",

          "boost": 8,

          "store": true,

          "type": "string"

        }

      }

    }

  }

}

'

### step 4

curl -XPUT localhost:9200/test/solution/1 -d '

{

    "title": "番茄"

}

'



curl -XPUT localhost:9200/test/solution/2 -d '

{

    "title": "西红柿"

}

'

### step 5

curl -XPOST 'localhost:9200/test/solution/_search?pretty' -d '

{

  "query": {

    "query_string": {

      "title": {

        "query": "西红柿",

        "analyzer": "jt_cn"

      }

    }

  },

  "highlight": {

    "pre_tags": [

      "<tag1>",

      "<tag2>"

    ],

    "post_tags": [

      "</tag1>",

      "</tag2>"

    ],

    "fields": {

      "title": {}

    }

  }

}

'

结果：

{

  "took": 3,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 2,

    "max_score": 0.4500804,

    "hits": [

      {

        "_index": "test",

        "_type": "solution",

        "_id": "1",

        "_score": 0.4500804,

        "_source": {

          "title": "西红柿"

        }

      },

      {

        "_index": "test",

        "_type": "solution",

        "_id": "2",

        "_score": 0.36006433,

        "_source": {

          "title": "番茄"

        }

      }

    ]

  }

}

wuyh

我按照网上所说的
把ik项目先从git上克隆下来，再maven打包，找到那个zip包，解压再/plugins/ik/的wen文件夹下，
然后再elasticsearch.yml的最后一行添加index.analysis.analyzer.default.type: ik
然后重启elasticsearch，就起不来起不来了。看控制台是有异常，但是一闪而过，也看不清楚
怎么破

11 个回复

发起人

相关问题

问题状态

Elasticsearch 使用IK分词，如何配置同义词？

与内容相关的链接

11 个回复

发起人

相关问题

问题状态