head插件搜索数据不正确
Elasticsearch • novia 回复了问题 • 2 人关注 • 1 个回复 • 2469 次浏览 • 2017-11-14 13:01
elasticsearch搜索优化
Elasticsearch • cyin 回复了问题 • 6 人关注 • 3 个回复 • 2990 次浏览 • 2017-11-16 09:52
写在 社区 日报100期时 —— 相信社区的力量
社区日报 • rockybean 发表了文章 • 9 个评论 • 2327 次浏览 • 2017-11-14 11:30

我一直有看[湾区日报](http://wanqu.co/)的习惯,后来看到 Golang 中国社区的 astaxie 也在做 [GoCN 每日新闻](https://gocn.io/explore/category-14) 的事情。日报对社区来讲是一件持续输出的事情,对个人来讲是一件持续输入的事情,于是我决定在 Elastic 社区也做这么一件事情,在2017年7月30日我发布了 [Elastic日报 第1期](https://elasticsearch.cn/article/201)。
当天和 medcl 聊过后,他建议发动社区的力量来做,这样才能保证日报做好做久。于是我们开始在社区里面招募日报编辑,很快便有很多同学响应,接着 Elastic日报编辑部
成立。到今天,我们一共有8位社区编辑,其中7位负责每周固定一天的日报,另一位负责审稿和公众号文章发布。他们分别是:
- 江水
- 金桥
- bsll
- 至尊宝
- 叮咚光军
- laoyang360
- cyberdak
- 陶文
感谢社区编辑们的付出,我们一同做了一件了不起的事情——持续100天的知识输出。如果有同学把这100天的日报内容都看完吃透,那它的Elastic 技术水准肯定提升了不止1个档次。
现在想来,如果是我一个人做这件事情,恐怕日报不会超过30期。个人的力量是有限的,而社区的力量是无限的。每天看到编辑们精挑细选的文章,我都会诧异 Elastic 相关的优秀文章可真是多啊!
相信社区的力量,让我们期待Elastic日报200期、300期甚至1000期的到来!
想研读下es源码,不知道从何入手,大家有什么建议?
Elasticsearch • novia 回复了问题 • 2 人关注 • 1 个回复 • 1682 次浏览 • 2017-11-14 11:36
span_containing和span_with查询到底是什么意思?两者什么区别?
Elasticsearch • kennywu76 回复了问题 • 5 人关注 • 2 个回复 • 4381 次浏览 • 2023-09-08 16:14
社区日报 第100期 (2017-11-14)
社区日报 • kimichen123 发表了文章 • 1 个评论 • 2210 次浏览 • 2017-11-14 06:30
http://t.cn/RjyNLge
2.Elasticsearch选主流程详细分析。
http://t.cn/RjyNPLT
3.手把手教你如何使用ES提高WordPress的搜索速度。
http://t.cn/RjbD1QK
4.只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:叮咚光军
归档:https://elasticsearch.cn/article/374
订阅:https://tinyletter.com/elastic-daily
小白问一个关于elasticsearch 设置的问题
Elasticsearch • bjfk2006 回复了问题 • 3 人关注 • 3 个回复 • 2941 次浏览 • 2017-11-14 16:34
data.path 配置多个路径,IO分布不均衡
Elasticsearch • bjfk2006 回复了问题 • 2 人关注 • 1 个回复 • 2672 次浏览 • 2017-11-14 16:24
有没有人做过ElasticSearch 跨集群查询需求的经验啊?大概说下
Elasticsearch • AlixMu 回复了问题 • 4 人关注 • 1 个回复 • 3485 次浏览 • 2017-11-14 10:31
能否只比较小时不比较天?
Kibana • kennywu76 回复了问题 • 4 人关注 • 2 个回复 • 3453 次浏览 • 2017-11-13 16:12
从es2.3到5.6的迁移实践
Elasticsearch • JiaShiwen 发表了文章 • 0 个评论 • 4352 次浏览 • 2017-11-13 13:58
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
- 10.204.12.31:9301
- 10.204.12.32:9301
- 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
安装ik分词器
bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip
配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器
cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1
打包部署其他节点时,先清理data目录
集群监控可以利用head的chrome插件
数据迁移
迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak。
数据备份
java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。
curl -XPUT "http://10.204.12.31:9201/house_geo" -H 'Content-Type: application/json' -d'
{
"mappings": {
"house": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"_category": {
"type": "keyword",
"store": true
},
"_content": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_deleted": {
"type": "boolean",
"store": true
},
"_doccreatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_docupdatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_flags": {
"type": "text",
"store": true,
"analyzer": "whitespace"
},
"_hits": {
"type": "text"
},
"_location": {
"type": "geo_point"
},
"_multi": {
"properties": {
"_location": {
"type": "geo_point"
}
}
},
"_origin": {
"type": "object",
"enabled": false
},
"_scope": {
"type": "keyword",
"store": true
},
"_tags": {
"type": "text",
"boost": 10,
"store": true,
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_title": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_uniqid": {
"type": "keyword",
"store": true
},
"_uniqsign": {
"type": "keyword",
"store": true
},
"_url": {
"type": "text",
"index": false,
"store": true
},
"location": {
"type": "geo_point"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "3",
"requests": {
"cache": {
"enable": "true"
}
},
"analysis": {
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis-ik/custom/synonym.dic"
}
},
"analyzer": {
"searchanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_smart"
},
"indexanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "1"
}
}
}'
利用新版elasticbak导入索引数据
java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
社区日报 第99期 (2017-11-13)
社区日报 • cyberdak 发表了文章 • 1 个评论 • 2326 次浏览 • 2017-11-13 09:20
http://t.cn/Rj2uLh9
2、logstash配置文件的vscode插件,从其编辑配置文件不再发愁。
http://t.cn/Rj21ncE
3、elk告警插件sentinl。随着版本的更新,目前已经可以媲美x-pack的reporter以及watcher。
http://t.cn/Rj216Ef
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:cyberdak
归档:https://elasticsearch.cn/article/372
订阅:https://tinyletter.com/elastic-daily
spring-boot 和elasticsearch整合数据存放目录
回复Elasticsearch • xisonchen 发起了问题 • 1 人关注 • 0 个回复 • 3212 次浏览 • 2017-11-12 18:29
为什么有了es还有好多企业自研搜索引擎
Elasticsearch • JiaShiwen 回复了问题 • 6 人关注 • 4 个回复 • 9046 次浏览 • 2017-11-15 12:29