使用 shuf 来打乱一个文件中的行或是选择文件中一个随机的行。

【ES性能问题】matchquery 慢 慢 慢, 原因讨论?

Elasticsearchkennywu76 回复了问题 • 3 人关注 • 3 个回复 • 8010 次浏览 • 2017-09-18 10:54 • 来自相关话题

Received message from unsupported version: [2.0.0] minimal compatible version is: [5.0.0]

ElasticsearchCheetah 回复了问题 • 2 人关注 • 1 个回复 • 6650 次浏览 • 2017-09-18 09:55 • 来自相关话题

filebeat采集文件的时区问题

Beatsnovia 回复了问题 • 2 人关注 • 1 个回复 • 3374 次浏览 • 2017-09-18 09:26 • 来自相关话题

Elastic日报 第51期 (2017-09-18)

Elastic日报cyberdak 发表了文章 • 0 个评论 • 922 次浏览 • 2017-09-18 09:14 • 来自相关话题

1.针对不同大小数据的index优化设置 index 分区(需梯子)。

http://t.cn/RpBZ8d7

2. 来看看国外最流行的协作工具slack是如何运用elk来做安全分析的。

http://t.cn/RpB26BH

3. 在 Kibana 中使用脚本字段(需梯子)。

http://t.cn/RpBLhDb 

编辑:cyberdak
归档:https://www.elasticsearch.cn/article/281
订阅:https://tinyletter.com/elastic-daily
 

Elastic日报 第50期 (2017-09-17)

Elastic日报至尊宝 发表了文章 • 0 个评论 • 936 次浏览 • 2017-09-17 05:56 • 来自相关话题

1.通过Elasticsearch创建阈值警报器。
http://t.cn/RpmnT8C
2. 使用Elasticsearch和grafana分析github项目。
http://t.cn/R9xXkZE
3. 第二届GrafanaCon谈话视频整理。
http://t.cn/RpmmMk3

编辑:至尊宝
归档:https://www.elasticsearch.cn/article/280
订阅:https://tinyletter.com/elastic-daily

Elastic日报 第49期 (2017-09-16)

Elastic日报bsll 发表了文章 • 0 个评论 • 809 次浏览 • 2017-09-16 09:04 • 来自相关话题

1.关于索引的停用词,你知多少?
http://t.cn/RpYDk2c
2. 手把手教你在Azure搭建ELK
http://t.cn/RpYsAG8
3.  你知道es可以执行包含多个词的同义词的词组查询吗?
http://t.cn/RpTvc5Z 

编辑:bsll
归档:https://www.elasticsearch.cn/article/279
订阅:https://tinyletter.com/elastic-daily
 

内网两个es集群被黑了?

回复

Elasticsearchfamoss 回复了问题 • 1 人关注 • 1 个回复 • 1366 次浏览 • 2017-09-15 16:50 • 来自相关话题

今天运行最新的ES 5.6.0版本,出现java.lang.IllegalStateException: Unable to initialize modules

回复

Elasticsearch独行人945 回复了问题 • 1 人关注 • 1 个回复 • 2361 次浏览 • 2017-09-15 15:21 • 来自相关话题

logstash有没有接口

Logstashjnuc093 回复了问题 • 1 人关注 • 1 个回复 • 1716 次浏览 • 2017-09-15 14:22 • 来自相关话题

keyword无法展示结果

回复

Elasticsearchxiaoke 回复了问题 • 1 人关注 • 1 个回复 • 1299 次浏览 • 2017-09-15 12:00 • 来自相关话题

ES中可否控制字段输出的长度,只返回前300个字节的内容?

Elasticsearchlaoyang360 发表了文章 • 3 个评论 • 2943 次浏览 • 2017-09-15 11:56 • 来自相关话题


举例:content字段是一篇正文,很长。我这边只需要前300个字节。
我可以通过_source控制输出content,
有没有办法控制content,只返回前300个字节的内容。

返回完再裁剪,我知道通过程序处理。
想知道有没有参数控制,直接返回给定长度的串内容?

举例:content字段是一篇正文,很长。我这边只需要前300个字节。
我可以通过_source控制输出content,
有没有办法控制content,只返回前300个字节的内容。

返回完再裁剪,我知道通过程序处理。
想知道有没有参数控制,直接返回给定长度的串内容?

为何要避免往ES里写入稀疏数据

Elasticsearchkennywu76 发表了文章 • 1 个评论 • 2475 次浏览 • 2017-09-15 11:24 • 来自相关话题

转几篇文章,让大家知晓,当前版本(<=5.x) 为何要避免将稀疏的数据写入ES。 随着ES/Lucene编码的改进,这个问题未来版本可能会得到改善,特别是ES6.0/Lucene7.0优化了doc_values对稀疏数据的编码方式。
 
https://www.elastic.co/guide/e ... rsity  


Avoid sparsityedit

The data-structures behind Lucene, which Elasticsearch relies on in order to index and store data, work best with dense data, ie. when all documents have the same fields. This is especially true for fields that have norms enabled (which is the case for text fields by default) or doc values enabled (which is the case for numerics, date, ip and keyword by default).

The reason is that Lucene internally identifies documents with so-called doc ids, which are integers between 0 and the total number of documents in the index. These doc ids are used for communication between the internal APIs of Lucene: for instance searching on a term with a matchquery produces an iterator of doc ids, and these doc ids are then used to retrieve the value of the norm in order to compute a score for these documents. The way this norm lookup is implemented currently is by reserving one byte for each document. The norm value for a given doc id can then be retrieved by reading the byte at index doc_id. While this is very efficient and helps Lucene quickly have access to the norm values of every document, this has the drawback that documents that do not have a value will also require one byte of storage.

In practice, this means that if an index has M documents, norms will require M bytes of storage per field, even for fields that only appear in a small fraction of the documents of the index. Although slightly more complex with doc values due to the fact that doc values have multiple ways that they can be encoded depending on the type of field and on the actual data that the field stores, the problem is very similar. In case you wonder: fielddata, which was used in Elasticsearch pre-2.0 before being replaced with doc values, also suffered from this issue, except that the impact was only on the memory footprint since fielddata was not explicitly materialized on disk.

Note that even though the most notable impact of sparsity is on storage requirements, it also has an impact on indexing speed and search speed since these bytes for documents that do not have a field still need to be written at index time and skipped over at search time.

It is totally fine to have a minority of sparse fields in an index. But beware that if sparsity becomes the rule rather than the exception, then the index will not be as efficient as it could be.

This section mostly focused on norms and doc values because those are the two features that are most affected by sparsity. Sparsity also affect the efficiency of the inverted index (used to index text/keyword fields) and dimensional points (used to index geo_point and numerics) but to a lesser extent.

Here are some recommendations that can help avoid sparsity:


https://www.elastic.co/blog/index-vs-type


Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity. Sparse postings lists can’t be compressed efficiently because of high deltas between consecutive matches. And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently. This means that if Lucene establishes that it needs one byte to store all value of a given numeric field, it will also consume one byte for documents that don’t have a value for this field. Future versions of Elasticsearch will have improvements in this area but I would still advise you to model your data in a way that will limit sparsity as much as possible.


https://www.elastic.co/blog/sp ... ucene
[url=https://issues.apache.org/jira/browse/LUCENE-6863]https://issues.apache.org/jira/browse/LUCENE-6863​[/url] 
https://www.elastic.co/blog/el ... eased

Screen_Shot_2017-09-15_at_11.29_.44_.png

elastic丢数据

回复

LogstashKerwinC 发起了问题 • 1 人关注 • 0 个回复 • 2675 次浏览 • 2017-09-15 11:01 • 来自相关话题

elasticsearch搜索大文档时highlight导致CPU占满

Elasticsearchkennywu76 回复了问题 • 5 人关注 • 2 个回复 • 2940 次浏览 • 2017-09-15 10:29 • 来自相关话题

Elastic日报 第48期 (2017-09-15)

Elastic日报laoyang360 发表了文章 • 0 个评论 • 1045 次浏览 • 2017-09-15 06:37 • 来自相关话题

1.基于时间轴的可视化方案选型:kibana or grafana?
http://t.cn/RpOCVsz 
2.twitter数据导入Elasticsearch的三种方式。
http://t.cn/RpOC6An 
3.CSV数据导入Elasticsearch及可视化方案。
http://t.cn/RCGeeJK 

编辑:laoyang360
归档:https://www.elasticsearch.cn/article/276 
订阅:https://tinyletter.com/elastic-daily