Elasticsearch

使用Helper.bulk导入json数据时，指定了文档id，会丢数据，试了很多方法都没有用

贡献

正道回复了问题 • 2 人关注 • 2 个回复 • 4562 次浏览 • 2018-03-19 16:35 • 来自相关话题

ingest node中的pipeline如何处理文档字段缺失或为null的情形？

贡献

xinfanwang 回复了问题 • 3 人关注 • 1 个回复 • 2290 次浏览 • 2018-03-19 14:03 • 来自相关话题

es2.x版本升级到es6.1版本bulk插入速度变慢，从哪些方面定位？

greatnew 回复了问题 • 1 人关注 • 2 个回复 • 4912 次浏览 • 2018-03-19 11:58 • 来自相关话题

关于ES集群规划的求助

贡献

mmtt 回复了问题 • 7 人关注 • 3 个回复 • 2306 次浏览 • 2018-03-19 11:23 • 来自相关话题

es6.2在linux下修改绑定ip无法启动

贡献

steven123 回复了问题 • 2 人关注 • 1 个回复 • 3181 次浏览 • 2018-03-19 09:20 • 来自相关话题

运行elasticsearch出现内存异常，不能启动elasticsearch

贡献

code4j 回复了问题 • 3 人关注 • 2 个回复 • 3438 次浏览 • 2018-03-19 09:11 • 来自相关话题

Elasticsearch6.x 以上版本,如何设置副本数量?

贡献

laoyang360 回复了问题 • 3 人关注 • 2 个回复 • 9078 次浏览 • 2018-03-15 23:27 • 来自相关话题

java应用中怎么更好地实现es索引的自动创建

jyingzhi 发起了问题 • 2 人关注 • 0 个回复 • 4095 次浏览 • 2018-03-15 22:22 • 来自相关话题

如何能得到两个query的匹配得分 es_match_score(query1, query2)？

deniel 发起了问题 • 1 人关注 • 0 个回复 • 3495 次浏览 • 2018-03-15 19:24 • 来自相关话题

java api 如何删除指定Index的别名

贡献

ygm 回复了问题 • 2 人关注 • 2 个回复 • 3019 次浏览 • 2018-03-15 16:54 • 来自相关话题

_id is not configurable

贡献

laoyang360 回复了问题 • 3 人关注 • 2 个回复 • 2956 次浏览 • 2018-03-15 08:19 • 来自相关话题

elasticsearch分词检索的match-query匹配过程分析

夏李俊发表了文章 • 4 个评论 • 4595 次浏览 • 2018-03-14 12:00 • 来自相关话题

1. 模拟字符串数据存储

localhost:9200/yigo-redist.1/_analyze?analyzer=default&text=全能片(前)---TRW-GDB7891AT刹车片自带报警线，无单独报警线号码,卡仕欧,卡仕欧,乘用车,刹车片

上面的url表示

索引为`yigo-redist.1`
使用了索引`yigo-redist.1`中的分词器(`analyzer`) `default`
解析的字符串(`text`)为"全能片(前)---TRW-GDB7891AT刹车片自带报警线，无单独报警线号码,卡仕欧,卡仕欧,乘用车,刹车片"

如果结果为:

{

  "tokens" : [ {

    "token" : "全能",

    "start_offset" : 0,

    "end_offset" : 2,

    "type" : "CN_WORD",

    "position" : 1

  }, {

    "token" : "片",

    "start_offset" : 2,

    "end_offset" : 3,

    "type" : "CN_CHAR",

    "position" : 2

  }, {

    "token" : "前",

    "start_offset" : 4,

    "end_offset" : 5,

    "type" : "CN_CHAR",

    "position" : 3

  }, {

    "token" : "trw-gdb7891at",

    "start_offset" : 9,

    "end_offset" : 22,

    "type" : "LETTER",

    "position" : 4

  }, {

    "token" : "刹车片",

    "start_offset" : 22,

    "end_offset" : 25,

    "type" : "CN_WORD",

    "position" : 5

  }, {

    "token" : "自带",

    "start_offset" : 25,

    "end_offset" : 27,

    "type" : "CN_WORD",

    "position" : 6

  }, {

    "token" : "报警",

    "start_offset" : 27,

    "end_offset" : 29,

    "type" : "CN_WORD",

    "position" : 7

  }, {

    "token" : "线",

    "start_offset" : 29,

    "end_offset" : 30,

    "type" : "CN_CHAR",

    "position" : 8

  }, {

    "token" : "无",

    "start_offset" : 31,

    "end_offset" : 32,

    "type" : "CN_WORD",

    "position" : 9

  }, {

    "token" : "单独",

    "start_offset" : 32,

    "end_offset" : 34,

    "type" : "CN_WORD",

    "position" : 10

  }, {

    "token" : "报警",

    "start_offset" : 34,

    "end_offset" : 36,

    "type" : "CN_WORD",

    "position" : 11

  }, {

    "token" : "线",

    "start_offset" : 36,

    "end_offset" : 37,

    "type" : "CN_CHAR",

    "position" : 12

  }, {

    "token" : "号码",

    "start_offset" : 37,

    "end_offset" : 39,

    "type" : "CN_WORD",

    "position" : 13

  }, {

    "token" : "卡",

    "start_offset" : 40,

    "end_offset" : 41,

    "type" : "CN_CHAR",

    "position" : 14

  }, {

    "token" : "仕",

    "start_offset" : 41,

    "end_offset" : 42,

    "type" : "CN_WORD",

    "position" : 15

  }, {

    "token" : "欧",

    "start_offset" : 42,

    "end_offset" : 43,

    "type" : "CN_WORD",

    "position" : 16

  }, {

    "token" : "卡",

    "start_offset" : 44,

    "end_offset" : 45,

    "type" : "CN_CHAR",

    "position" : 17

  }, {

    "token" : "仕",

    "start_offset" : 45,

    "end_offset" : 46,

    "type" : "CN_WORD",

    "position" : 18

  }, {

    "token" : "欧",

    "start_offset" : 46,

    "end_offset" : 47,

    "type" : "CN_WORD",

    "position" : 19

  }, {

    "token" : "乘用车",

    "start_offset" : 48,

    "end_offset" : 51,

    "type" : "CN_WORD",

    "position" : 20

  }, {

    "token" : "刹车片",

    "start_offset" : 52,

    "end_offset" : 55,

    "type" : "CN_WORD",

    "position" : 21

  } ]

}

2. 关键词查询

localhost:9200//yigo-redist.1/_analyze?analyzer=default_search&text=gdb7891

索引为`yigo-redist.1`
使用了索引`yigo-redist.1`中的分词器(`analyzer`) `default_search`
解析的字符串(`text`)为"gdb7891"

返回结果：

{

  "tokens" : [ {

    "token" : "gdb7891",

    "start_offset" : 0,

    "end_offset" : 7,

    "type" : "LETTER",

    "position" : 1

  } ]

}

3. 关键词使用存储的分词器查询

localhost:9200//yigo-redist.1/_analyze?analyzer=default&text=gdb7891

索引为`yigo-redist.1`
使用了索引`yigo-redist.1`中的分词器(`analyzer`) `default_search`
解析的字符串(`text`)为"gdb7891"

返回结果：

{

  "tokens" : [ {

    "token" : "gdb7891",

    "start_offset" : 0,

    "end_offset" : 7,

    "type" : "LETTER",

    "position" : 1

  }, {

    "token" : "",

    "start_offset" : 0,

    "end_offset" : 7,

    "type" : "LETTER",

    "position" : 1

  }, {

    "token" : "gdb7891",

    "start_offset" : 0,

    "end_offset" : 7,

    "type" : "LETTER",

    "position" : 1

  }, {

    "token" : "",

    "start_offset" : 0,

    "end_offset" : 3,

    "type" : "ENGLISH",

    "position" : 2

  }, {

    "token" : "gdb",

    "start_offset" : 0,

    "end_offset" : 3,

    "type" : "ENGLISH",

    "position" : 2

  }, {

    "token" : "gdb",

    "start_offset" : 0,

    "end_offset" : 3,

    "type" : "ENGLISH",

    "position" : 2

  }, {

    "token" : "7891",

    "start_offset" : 3,

    "end_offset" : 7,

    "type" : "ARABIC",

    "position" : 3

  }, {

    "token" : "7891",

    "start_offset" : 3,

    "end_offset" : 7,

    "type" : "ARABIC",

    "position" : 3

  }, {

    "token" : "",

    "start_offset" : 3,

    "end_offset" : 7,

    "type" : "ARABIC",

    "position" : 3

  } ]

}

总结

通过步骤1可以看出,存储的数据"全能片(前)---TRW-GDB7891AT刹车片自带报警线，无单独报警线号码,卡仕欧,卡仕欧,乘用车,刹车片",被拆分成了很多词组碎片,然后存储在了索引数据中
通过步骤2可以看出,当关键词输入"gdb7891",这个在检索分词器(`default_search`)下,没有拆分,只一个可供查询的碎片就是"gdb7891",但是步骤1,拆分的碎片里不存在"gb7891"的词组碎片,唯一相近的就是"trw-gdb7891at",所以使用普通的match-query是无法匹配步骤1输入的索引数据
通过步骤3,可以看出如果使用相同的分词器,"gdb7891"能够拆分成"gdb","7891"等等,通过这2个碎片都能找到步骤1输入的索引数据,但是因为关键词被拆分了,所以会查询到更多的匹配的数据,比如:与"gdb"匹配的,与"7891"匹配的,与"gdb7891"匹配的
如果说想通过分词器(`default_search`)检索出步骤1的数据,需要使用wildcard-query,使用"*gdb7891*",就可以匹配
```
  {      "query": {          "wildcard" : { "description" : "*gdb7891*" }      }  }
```