bulk无法导入数据

单机下,索引已有七千万数据,再次导入数据时,无法导入,elasticsearch日志一直打印gc overhead信息
已邀请:

laoyang360 - [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:blog.csdn.net/laoyang360

赞同来自: lzc

数据量太大,超过队列大小了。你可以一次批量少一些数据。

JackGe - 滴滴出行 es平台成员

赞同来自: lzc

es节点内存配置如何,jstat -gcutil pid 1000 查看节点gc情况。bulk写入参数, bulk 队列大小"threadpool.bulk.queue_size": "5000"和"threadpool.bulk.size": "24"。BulkProcessor.Builder相关参数,如bulk size是几MB,日志场景下注重吞吐可以设置10MB,bulk action个数,concurrentRequests,bulk写入对于es来说是有大量新对象,建议适当调大young区,七千万数据索引term占用的内存也不会很多,估计几MB。

lzc

赞同来自:

我试着索引单条文档后,出现错误
ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];]
修改后可以继续索引数据

bill

赞同来自:

出现gc overhead信息说明为ES进程分配的JVM内存小了。可以按下面方法增加heap size.

There are two ways to change the heap size in Elasticsearch. The easiest is to set an environment variable called ES_HEAP_SIZE. When the server process starts, it will read this environment variable and set the heap accordingly. As an example, you can set it via the command line as follows:
export ES_HEAP_SIZE=10g

Alternatively, you can pass in the heap size via a command-line argument when starting the process, if that is easier for your setup: 
./bin/elasticsearch -Xmx=10g -Xms=10g
Ensure that the min (Xms) and max (Xmx) sizes are the same to prevent the heap from resizing at runtime, a very costly process.

Generally, setting the ES_HEAP_SIZE environment variable is preferred over setting explicit -Xmx and -Xms values.


 

xinfanwang

赞同来自:

如果单个bulk记录数过多,就可能出现这个问题。建议做一下流控,限制每次请求index的数据条目。具体多少条,自己压力测一下比较好。

要回复问题请先登录注册