实际项目的ES性能问题，大家看看该场景是否适合用ES，如何提高？

Elasticsearch | 作者 cheese | 发布于2017年03月18日 | 阅读数：9029

用来放用户画像，基本的功能是根据标签查询和聚合，目前总共30个标签，以后估计怎么也得上百个标签。
总共的数据是2.8亿，1.2T左右，最终的索引量是150亿，标签都是nest类型。
目前集群是128G内存 40core
查询和聚合大概需要10+s

大神们看看正常不？其实技术方案调研阶段，对ES还不是很了解。

4 个回复

kennywu76 - Wood

赞同来自: cheese 、youryida 、machao

根据范例里给的mapping定义来看，每个内嵌类型tab_x内部并非用于存放一个object，而是只有一个属性code，这种情况下完全没有必要用nested type，直接用object type就可以了，比如:

{
"index1": {
"mappings": {
"tag_type": {
"dynamic": "false",
"_all": {
"enabled": false
},
"properties": {
"tab_a": {
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
}
}
},
"tab_b": {
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
}
}
},
......
}
}
}
}
}

其效果等同于一系列扁平的属性:
tab_a.code
tab_b.code
....

nested documents主要应用于外层文档和内层文档有需要join，并且需要维护内层object的独立性的场景。在索引上实际每个内嵌的object都是单独成一条文档，因此实际生成索引的文档数会远大于外层文档数量。而使用普通的object类型，内嵌的object被flatten成文档的一条属性而已，实际生成的文档数量小得多。查询和聚合速度都会快很多。

dixingxing

赞同来自: AlixMu

我补充一下细节吧，目前用户画像分很多标签，每个标签都是nest类型的，比如车系这个标签下可能包含用户感兴趣的若干个车系: 宝马3系, 宝马5系 ....。

我们的场景是分析对“宝马3系”感兴趣的“女性” 用户， “是否有车”，“省份”， “感兴趣的品牌” 这几个标签的分布情况。

对应的mapping:
{
"index1": {
"mappings": {
"tag_type": {
"dynamic": "false",
"_all": {
"enabled": false
},
"properties": {
"tab_a": {
"type": "nested",
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
}
}
},
"tab_b": {
"type": "nested",
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
}
}
},
......
}
}
}
}
}

查询：
GET /index1/_search/
{
"size": 0,
"query": {
"filtered": {
"filter": {
"and": {
"filters": [
{
"nested": {
"path": "tag_a",
"filter": {
"term": {
"tag_a.code": "1"
}
}
}
},
{
"nested": {
"path": "tag_b",
"filter": {
"term": {
"tag_b.code": "01"
}
}
}
}
]
}
}
}
},
"aggs": {
"name1": {
"nested": {
"path": "tag_c"
},
"aggs": {
"name1_1": {
"terms": {
"field": "tag_c.code",
"size": "2"
}
}
}
},
"name2": {
"nested": {
"path": "tag_d"
},
"aggs": {
"name2_1": {
"terms": {
"field": "tag_d.code",
"size": "2"
}
}
}
},
"name3": {
"nested": {
"path": "tag_e"
},
"aggs": {
"name3_1": {
"terms": {
"field": "tag_e.code",
"size": "2"
}
}
}
}
}
}

响应:

{
"took": 5315,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"failed": 0
},
"hits": {
"total": 66023052,
"max_score": 0,
"hits":
},
"aggregations": {
"name3": {
"doc_count": 35795694,
"name3_1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 22021428
},
{
"key": "0",
"doc_count": 13774266
}
]
}
},
"name2": {
"doc_count": 82357236,
"name2_1": {
"doc_count_error_upper_bound": 130740,
"sum_other_doc_count": 39859740,
"buckets": [
{
"key": "3",
"doc_count": 24093930
},
{
"key": "17",
"doc_count": 15408540
}
]
}
},
"name1": {
"doc_count": 12053064,
"name1_1": {
"doc_count_error_upper_bound": 14758,
"sum_other_doc_count": 12006666,
"buckets": [
{
"key": "4403002868",
"doc_count": 24936
},
{
"key": "4403002857",
"doc_count": 21462
}
]
}
}
}
}

我们发现hits的记录数越多，相应的耗时越长，聚合的标签越多，耗时越长。
比如上面这个查询如果去掉tab_b这个过滤条件，hits 246596466 时，响应时间就变成10秒了。

我们用10个线程测试的情况不是很理想， 90%的响应时间在16秒内。

接下来可能的测试方向:
1.减小原始数据大小
2.关闭_source

希望大神能针对我们的场景给出一些优化的思路。

最后是一些配置信息。

服务器配置：
8台 128g, 40核的服务器（非SSD）

索引信息：
748G (算上replica 1.46T)

"settings": {
"index": {
"creation_date": "1489748059778",
"refresh_interval": "-1",
"number_of_shards": "8",
"number_of_replicas": "1",
"uuid": "bRhFkrHJQdCZrV5fF9ilmQ",
"version": {
"created": "2040499"
}
}
}

段合并后 80 个segments （算上replica 每个服务器上10个segments）

集群的配置:

cluster.name: es

node.name: es1

bootstrap.memory_lock: true

network.host: xxx.xx.xx.x

http.port: 9200

transport.tcp.port: 9300
network.bind_host: xxx.xx.xx.x

discovery.zen.ping.unicast.hosts: ["xxx.xx.xx.x","xxx.xx.xx.x","xxx.xx.xx.x","xxx.xx.xx.x","xxx.xx.xx.x","xxx.xx.xx.x","xxx.xx.xx.x"]

discovery.zen.minimum_master_nodes: 4

gateway.recover_after_data_nodes: 6
gateway.expected_nodes: 8

node.max_local_storage_nodes: 1

action.destructive_requires_name: true

action.auto_create_index: false

indices.store.throttle.max_bytes_per_sec: "100mb"

index.merge.scheduler.max_thread_count: 1

indices.breaker.total.limit: 70%
indices.breaker.fielddata.limit: 20%
indices.breaker.request.limit: 40%

indices.fielddata.cache.size: 20%
indices.queries.cache.size: 40%
indices.memory.index_buffer_size: 1024m
indices.memory.min_shard_index_buffer_size: 512m
indices.requests.cache.size: 2%

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 0ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 0ms

index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms

script.engine.groovy.inline.aggs: true
script.engine.groovy.inline.search: true

monitor.jvm.gc.young.warn: 1000ms
monitor.jvm.gc.young.info: 700ms
monitor.jvm.gc.young.debug: 400ms

monitor.jvm.gc.old.warn: 10s
monitor.jvm.gc.old.info: 5s
monitor.jvm.gc.old.debug: 2s

medcl - 今晚打老虎。

把实际的场景的具体参数再完善一下吧。

cheese

赞，真够详细的~

要回复问题请先登录或注册

实际项目的ES性能问题，大家看看该场景是否适合用ES，如何提高？

4 个回复

发起人

相关问题

问题状态

实际项目的ES性能问题，大家看看该场景是否适合用ES，如何提高？

与内容相关的链接

4 个回复

发起人

相关问题

问题状态