提问:布和纸怕什么?

想研读下es源码,不知道从何入手,大家有什么建议?

novia 回复了问题 • 2 人关注 • 1 个回复 • 1391 次浏览 • 2017-11-14 11:36 • 来自相关话题

span_containing和span_with查询到底是什么意思?两者什么区别?

Charele 回复了问题 • 5 人关注 • 3 个回复 • 2977 次浏览 • 2023-09-08 16:14 • 来自相关话题

小白问一个关于elasticsearch 设置的问题

bjfk2006 回复了问题 • 3 人关注 • 3 个回复 • 2226 次浏览 • 2017-11-14 16:34 • 来自相关话题

data.path 配置多个路径,IO分布不均衡

bjfk2006 回复了问题 • 2 人关注 • 1 个回复 • 2181 次浏览 • 2017-11-14 16:24 • 来自相关话题

有没有人做过ElasticSearch 跨集群查询需求的经验啊?大概说下

AlixMu 回复了问题 • 4 人关注 • 1 个回复 • 3033 次浏览 • 2017-11-14 10:31 • 来自相关话题

从es2.3到5.6的迁移实践

JiaShiwen 发表了文章 • 0 个评论 • 3764 次浏览 • 2017-11-13 13:58 • 来自相关话题

config/elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
   - 10.204.12.31:9301
   - 10.204.12.32:9301
   - 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options

## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true



安装ik分词器


bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip




./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip



配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd&quot;&gt;
<properties>
       <comment>IK Analyzer 扩展配置</comment>
       <!--用户可以在这里配置自己的扩展字典 -->
       <entry key="ext_dict"></entry>
        <!--用户可以在这里配置自己的扩展停止词字典-->
       <entry key="ext_stopwords"></entry>
       <!--用户可以在这里配置远程扩展字典 -->
       <entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
       <!--用户可以在这里配置远程扩展停止词字典-->
       <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器

cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1




打包部署其他节点时,先清理data目录




集群监控可以利用head的chrome插件




数据迁移

迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak

数据备份

java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。  

 

curl -XPUT "http://10.204.12.31:9201/house_geo&quot; -H 'Content-Type: application/json' -d'
{
 "mappings": {
   "house": {
     "dynamic": "strict",
     "_all": {
       "enabled": false
     },
     "properties": {
       "_category": {
         "type": "keyword",
                 "store": true
       },
       "_content": {
         "type": "text",
         "store": true,
         "analyzer": "ik_max_word",
         "search_analyzer": "ik_smart"
       },
       "_deleted": {
         "type": "boolean",
         "store": true
       },
       "_doccreatetime": {
         "type": "date",
         "store": true,
         "format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
       },
       "_docupdatetime": {
         "type": "date",
         "store": true,
         "format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
       },
       "_flags": {
         "type": "text",
         "store": true,
         "analyzer": "whitespace"
       },
       "_hits": {
         "type": "text"
       },
       "_location": {
         "type": "geo_point"
       },
       "_multi": {
         "properties": {
           "_location": {
             "type": "geo_point"
           }
         }
       },
       "_origin": {
         "type": "object",
         "enabled": false
       },
       "_scope": {
         "type": "keyword",
         "store": true
       },
       "_tags": {
         "type": "text",
         "boost": 10,
         "store": true,
         "term_vector": "with_positions_offsets",
         "analyzer": "ik_max_word",
         "search_analyzer": "ik_smart"
       },
       "_title": {
         "type": "text",
         "store": true,
         "analyzer": "ik_max_word",
         "search_analyzer": "ik_smart"
       },
       "_uniqid": {
         "type": "keyword",
         "store": true
       },
       "_uniqsign": {
         "type": "keyword",
         "store": true
       },
       "_url": {
         "type": "text",
         "index": false,
         "store": true
       },
       "location": {
         "type": "geo_point"
       }
     }
   }
 },
 "settings": {
   "index": {
     "number_of_shards": "3",
     "requests": {
       "cache": {
         "enable": "true"
       }
     },
     "analysis": {
       "filter": {
         "my_synonym": {
           "type": "synonym",
           "synonyms_path": "analysis-ik/custom/synonym.dic"
         }
       },
       "analyzer": {
         "searchanalyzer": {
           "filter": "my_synonym",
           "type": "custom",
           "tokenizer": "ik_smart"
         },
         "indexanalyzer": {
           "filter": "my_synonym",
           "type": "custom",
           "tokenizer": "ik_max_word"
         }
       }
     },
     "number_of_replicas": "1"
   }
 }
}'
利用新版elasticbak导入索引数据


java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
 

spring-boot 和elasticsearch整合数据存放目录

回复

xisonchen 发起了问题 • 1 人关注 • 0 个回复 • 2713 次浏览 • 2017-11-12 18:29 • 来自相关话题

为什么有了es还有好多企业自研搜索引擎

JiaShiwen 回复了问题 • 6 人关注 • 4 个回复 • 8005 次浏览 • 2017-11-15 12:29 • 来自相关话题

es scroll 每次返回多少数据量合适啊?

bjfk2006 回复了问题 • 4 人关注 • 2 个回复 • 3702 次浏览 • 2017-11-14 17:05 • 来自相关话题

三步上手 esrally 完成 elasticsearch 压测任务

rockybean 发表了文章 • 2 个评论 • 4153 次浏览 • 2017-11-12 11:31 • 来自相关话题

[原文链接](https://segmentfault.com/a/1190000011966008)

距离上一篇 [esrally 教程](https://segmentfault.com/a/1190000011174694)过去快2个月了,这期间不停有同学来询问使用中遇到的问题,尤其由于其测试数据存储在国外 aws 上,导致下载极慢。为了让大家快速上手使用 esrally,我 build 了一个可用的 docker 镜像,然后将 13GB 的测试数据拉取到国内的存储上,通过百度网盘的方式分享给大家。大家只要按照下面简单的几步操作就可以顺畅地使用 esrally 来进行相关测试了。

操作步骤


废话不多说,先上菜!

  1. 拉取镜像
    `<br /> docker pull rockybean/esrally<br />
  2. 下载数据文件 链接:http://pan.baidu.com/s/1eSrjZgA 密码:aagl
  3. 进入下载后的文件夹 rally_track,执行如下命令开始测试
    <br /> docker run -it -v $(PWD):/root/track rockybean/esrally esrally race --track-path=/root/track/logging --offline --pipeline=benchmark-only --target-hosts=192.168.1.105:9200<br />

    打完收工!

    几点说明


    数据文件介绍

    esrally 自带的测试数据即为 rally_track 文件夹中的内容,主要包括:

    • Geonames(geonames): for evaluating the performance of structured data.
    • Geopoint(geopoint): for evaluating the performance of geo queries.
    • Percolator(percolator): for evaluating the performance of percolation queries.
    • PMC(pmc): for evaluating the performance of full text search.
    • NYC taxis(nyc_taxis): for evaluating the performance for highly structured data.
    • Nested(nested): for evaluating the performance for nested documents.
    • Logging(logging): for evaluating the performance of (Web) server logs.
    • noaa(noaa): for evaluating the performance of range fields.

      可以根据自己的需要下载对应的测试数据,不必下载全部,保证对应文件夹下载完全即可。


      命令解释


      docker 相关

      docker run -it rockybean/esrally esrally 为执行的 esrally 命令,-v $(PWD):/root/track是将 rally_docker 文件夹映射到 docker 容器中,$(PWD)是获取当前目录的意思,所以在此前要 cd 到 rally_docker 目录,当然你写全路径也是没有问题的。

      esrally 的 docker 镜像比较简单,可以参看 [ github 项目介绍][1]。

      esrally 相关

      该镜像是通过自定义 track 的方式来加载数据,所以命令行中用到 --track=/root/track/logging 的命令行参数。注意这里的 /root/track 即上面我们绑定到容器的目录,更换 logging 为其他的数据集名称即可加载其他的测试数据。

      该容器只支持测试第三方 es 集群,即 --pipeline=benchmark-only 模式。这应该也是最常见的压测需求了。


      愉快地去玩耍吧!




      [1]: https://github.com/rockybean/esrally-docker

Transport Client 不轮询

回复

redhat 发起了问题 • 2 人关注 • 0 个回复 • 1741 次浏览 • 2017-11-12 10:41 • 来自相关话题

segment段文件非常大会有什么问题没?比如说100G一个?

ElastIcPG 回复了问题 • 8 人关注 • 3 个回复 • 3962 次浏览 • 2017-11-14 17:51 • 来自相关话题

kibana的统计查询太慢了。如何定位问题所在

rockybean 回复了问题 • 2 人关注 • 1 个回复 • 10661 次浏览 • 2017-11-11 17:38 • 来自相关话题

sense不能用了改用kibana吧

JiaShiwen 发表了文章 • 0 个评论 • 4884 次浏览 • 2017-11-11 12:34 • 来自相关话题

elasticsearch的dsl开发工具sense被google下架了,kibana console是很好的替代品。但是,我们的es集群前些日子因为应付安全检查改为https+basic auth方式(详细配置过程见本人博文:http://blog.csdn.net/jiashiwen ... 14374),kibana需要进行若干配置才能工作。另外老系统中还有elasticsearch2.3.3遗留,需要kibana4.5.1+sense。




一、elasticsearch5.5.2+kibana5.5.2

1.下载与elasticsearch版本号一致的kibana安装包,笔者目前开发环境5.5.2,对应kibana版本也为5.5.2(最新的5.6版本会报不兼容错误,不能运行)。




2.配置config/kibana.yml文件,主要配置项如下 
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200&quot;
elasticsearch.url: "https://192.168.1.1:9281/&quot;


# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
elasticsearch.username: "admin"
elasticsearch.password: "admin"


# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
elasticsearch.ssl.certificate: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.crt
elasticsearch.ssl.key: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.key


# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
elasticsearch.ssl.verificationMode: none各项配置看文件内说明,写的很清楚,这里就不翻译了,其中最重要的是这两样elasticsearch.ssl.certificate和elasticsearch.ssl.key,一定要与服务端保持一致。由于证书是自己生成的,校验项elasticsearch.ssl.verificationMode的值需要改为none。




启动kibana后,通过http://localhose:5601访问即可

elasticsearch5.6比5.2版本区别

回复

es_shengbin 发起了问题 • 1 人关注 • 0 个回复 • 2965 次浏览 • 2017-11-10 18:20 • 来自相关话题