bulk无法导入数据

Elasticsearch | 作者 lzc | 发布于2018年05月10日 | 阅读数：3712

单机下，索引已有七千万数据，再次导入数据时，无法导入，elasticsearch日志一直打印gc overhead信息

5 个回复

laoyang360 - 《一本书讲透Elasticsearch》作者，Elastic认证工程师 [死磕Elasitcsearch]知识星球地址：http://t.cn/RmwM3N9；微信公众号：铭毅天下; 博客：https://elastic.blog.csdn.net

赞同来自: lzc

数据量太大，超过队列大小了。你可以一次批量少一些数据。

JackGe

赞同来自: lzc

es节点内存配置如何，jstat -gcutil pid 1000 查看节点gc情况。bulk写入参数, bulk 队列大小"threadpool.bulk.queue_size": "5000"和"threadpool.bulk.size": "24"。BulkProcessor.Builder相关参数，如bulk size是几MB，日志场景下注重吞吐可以设置10MB，bulk action个数，concurrentRequests，bulk写入对于es来说是有大量新对象，建议适当调大young区，七千万数据索引term占用的内存也不会很多，估计几MB。

lzc

我试着索引单条文档后，出现错误
ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];]
修改后可以继续索引数据

bill

出现gc overhead信息说明为ES进程分配的JVM内存小了。可以按下面方法增加heap size.

There are two ways to change the heap size in Elasticsearch. The easiest is to set an environment variable called ES_HEAP_SIZE. When the server process starts, it will read this environment variable and set the heap accordingly. As an example, you can set it via the command line as follows:
export ES_HEAP_SIZE=10g

Alternatively, you can pass in the heap size via a command-line argument when starting the process, if that is easier for your setup:
./bin/elasticsearch -Xmx=10g -Xms=10g
Ensure that the min (Xms) and max (Xmx) sizes are the same to prevent the heap from resizing at runtime, a very costly process.

Generally, setting the ES_HEAP_SIZE environment variable is preferred over setting explicit -Xmx and -Xms values.

xinfanwang

如果单个bulk记录数过多，就可能出现这个问题。建议做一下流控，限制每次请求index的数据条目。具体多少条，自己压力测一下比较好。

要回复问题请先登录或注册

bulk无法导入数据

5 个回复

发起人

相关问题

问题状态

bulk无法导入数据

与内容相关的链接

5 个回复

发起人

相关问题

问题状态