elasticsearch运行2个多小时候服务停止

作者 wanghao20 | 发布于2018年08月10日 | 阅读数:1016

日志中没给任何有用信息
第一次
[2018-08-09T15:01:21,662][INFO ][o.e.c.m.MetaDataIndexTemplateService] [erEXFZq] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2018-08-09T15:50:57,733][INFO ][o.e.c.m.MetaDataIndexTemplateService] [erEXFZq] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2018-08-09T15:50:57,995][INFO ][o.e.c.m.MetaDataMappingService] [erEXFZq] [.kibana/4eRyDveNS46JNa1Dv3rdog] update_mapping [doc]
[2018-08-09T17:40:56,249][INFO ][o.e.n.Node               ] [erEXFZq] stopping ...
[2018-08-09T17:40:56,309][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
[2018-08-09T17:40:58,096][INFO ][o.e.x.w.WatcherService   ] [erEXFZq] stopping watch service, reason [shutdown initiated]
[2018-08-09T17:41:05,002][INFO ][o.e.n.Node               ] [erEXFZq] stopped
[2018-08-09T17:41:05,003][INFO ][o.e.n.Node               ] [erEXFZq] closing ...
[2018-08-09T17:41:05,568][INFO ][o.e.n.Node               ] [erEXFZq] closed
第二次
[2018-08-10T01:30:00,001][INFO ][o.e.x.m.MlDailyMaintenanceService] triggering scheduled [ML] maintenance tasks
[2018-08-10T01:30:00,227][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [erEXFZq] Deleting expired data
[2018-08-10T01:30:00,545][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [erEXFZq] Completed deletion of expired data
[2018-08-10T01:30:00,546][INFO ][o.e.x.m.MlDailyMaintenanceService] Successfully completed [ML] maintenance tasks
[2018-08-10T01:46:34,124][INFO ][o.e.n.Node               ] [erEXFZq] stopping ...
[2018-08-10T01:46:34,143][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started
[2018-08-10T01:46:34,302][INFO ][o.e.x.w.WatcherService   ] [erEXFZq] stopping watch service, reason [shutdown initiated]
[2018-08-10T01:46:37,942][INFO ][o.e.n.Node               ] [erEXFZq] stopped
[2018-08-10T01:46:37,943][INFO ][o.e.n.Node               ] [erEXFZq] closing ...
[2018-08-10T01:46:38,019][INFO ][o.e.n.Node               ] [erEXFZq] closed
 
 
 
gc日志里频繁一直出现如下日志,也不知道有没有问题
2018-08-10T09:07:56.840+0800: 672.948: Total time for which application threads were stopped: 0.0007428 seconds, Stopping threads took: 0.0001650 seconds
2018-08-10T09:07:56.841+0800: 672.949: Total time for which application threads were stopped: 0.0007228 seconds, Stopping threads took: 0.0001725 seconds
2018-08-10T09:07:56.882+0800: 672.990: Total time for which application threads were stopped: 0.0017878 seconds, Stopping threads took: 0.0004374 seconds
2018-08-10T09:07:57.047+0800: 673.155: Total time for which application threads were stopped: 0.0016035 seconds, Stopping threads took: 0.0003132 seconds
2018-08-10T09:07:57.111+0800: 673.219: [GC (Allocation Failure) 2018-08-10T09:07:57.111+0800: 673.219: [ParNew
Desired survivor size 100302848 bytes, new threshold 6 (max 6)
- age   1:   35207096 bytes,   35207096 total
- age   2:   16403424 bytes,   51610520 total
- age   3:    1948536 bytes,   53559056 total
- age   4:   10692696 bytes,   64251752 total
: 1643482K->81439K(1763584K), 0.0140022 secs] 3081040K->1518997K(6095552K), 0.0142626 secs] [Times: user=0.30 sys=0.00, real=0.01 secs] 
2018-08-10T09:07:57.125+0800: 673.234: Total time for which application threads were stopped: 0.0152031 seconds, Stopping threads took: 0.0001557 seconds
2018-08-10T09:07:57.126+0800: 673.235: Total time for which application threads were stopped: 0.0006582 seconds, Stopping threads took: 0.0000874 seconds
2018-08-10T09:07:57.146+0800: 673.255: Total time for which application threads were stopped: 0.0005733 seconds, Stopping threads took: 0.0000999 seconds
2018-08-10T09:07:57.281+0800: 673.390: Total time for which application threads were stopped: 0.0017009 seconds, Stopping threads took: 0.0005098 seconds
2018-08-10T09:07:57.414+0800: 673.523: Total time for which application threads were stopped: 0.0008044 seconds, Stopping threads took: 0.0001615 seconds
2018-08-10T09:07:57.452+0800: 673.560: Total time for which application threads were stopped: 0.0008429 seconds, Stopping threads took: 0.0001967 seconds
2018-08-10T09:07:57.465+0800: 673.574: Total time for which application threads were stopped: 0.0009349 seconds, Stopping threads took: 0.0002257 seconds
2018-08-10T09:07:57.466+0800: 673.575: Total time for which application threads were stopped: 0.0004486 seconds, Stopping threads took: 0.0001271 seconds
2018-08-10T09:07:57.489+0800: 673.598: Total time for which application threads were stopped: 0.0017779 seconds, Stopping threads took: 0.0010510 seconds
2018-08-10T09:07:58.490+0800: 674.599: Total time for which application threads were stopped: 0.0012923 seconds, Stopping threads took: 0.0002387 seconds
2018-08-10T09:08:00.775+0800: 676.883: [GC (Allocation Failure) 2018-08-10T09:08:00.775+0800: 676.883: [ParNew
Desired survivor size 100302848 bytes, new threshold 6 (max 6)
- age   1:   46544592 bytes,   46544592 total
- age   2:    6460296 bytes,   53004888 total
- age   3:    1807768 bytes,   54812656 total
- age   4:    1929888 bytes,   56742544 total
- age   5:   10592072 bytes,   67334616 total
: 1649119K->91635K(1763584K), 0.0209164 secs] 3086677K->1529193K(6095552K), 0.0213371 secs] [Times: user=0.44 sys=0.00, real=0.02 secs] 
2018-08-10T09:08:00.796+0800: 676.905: Total time for which application threads were stopped: 0.0231092 seconds, Stopping threads took: 0.0003198 seconds
2018-08-10T09:08:00.798+0800: 676.906: Total time for which application threads were stopped: 0.0013180 seconds, Stopping threads took: 0.0002127 seconds
2018-08-10T09:08:01.799+0800: 677.908: Total time for which application threads were stopped: 0.0013036 seconds, Stopping threads took: 0.0002592 seconds
2018-08-10T09:08:02.801+0800: 678.909: Total time for which application threads were stopped: 0.0014876 seconds, Stopping threads took: 0.0003519 seconds
2018-08-10T09:08:04.058+0800: 680.167: Total time for which application threads were stopped: 0.0015071 seconds, Stopping threads took: 0.0003596 seconds
2018-08-10T09:08:05.060+0800: 681.168: Total time for which application threads were stopped: 0.0012316 seconds, Stopping threads took: 0.0002381 seconds
2018-08-10T09:08:06.061+0800: 682.170: Total time for which application threads were stopped: 0.0012258 seconds, Stopping threads took: 0.0002400 seconds
2018-08-10T09:08:07.885+0800: 683.994: Total time for which application threads were stopped: 0.0015366 seconds, Stopping threads took: 0.0002901 seconds
2018-08-10T09:08:07.888+0800: 683.997: Total time for which application threads were stopped: 0.0032131 seconds, Stopping threads took: 0.0001282 seconds
2018-08-10T09:08:07.939+0800: 684.048: Total time for which application threads were stopped: 0.0015597 seconds, Stopping threads took: 0.0003327 seconds
2018-08-10T09:08:08.082+0800: 684.190: Total time for which application threads were stopped: 0.0014115 seconds, Stopping threads took: 0.0002589 seconds
2018-08-10T09:08:08.325+0800: 684.434: Total time for which application threads were stopped: 0.0015528 seconds, Stopping threads took: 0.0003533 seconds
2018-08-10T09:08:08.927+0800: 685.035: Total time for which application threads were stopped: 0.0015776 seconds, Stopping threads took: 0.0003078 seconds
2018-08-10T09:08:09.084+0800: 685.193: Total time for which application threads were stopped: 0.0023994 seconds, Stopping threads took: 0.0010215 seconds
2018-08-10T09:08:09.090+0800: 685.199: [GC (Allocation Failure) 2018-08-10T09:08:09.091+0800: 685.199: [ParNew
Desired survivor size 100302848 bytes, new threshold 6 (max 6)
- age   1:   35832304 bytes,   35832304 total
- age   2:    8348368 bytes,   44180672 total
- age   3:    2477976 bytes,   46658648 total
- age   4:    1330312 bytes,   47988960 total
- age   5:    1857528 bytes,   49846488 total
- age   6:   10299464 bytes,   60145952 total
: 1659315K->91283K(1763584K), 0.0168375 secs] 3096873K->1528841K(6095552K), 0.0172931 secs] [Times: user=0.32 sys=0.00, real=0.01 secs] 
2018-08-10T09:08:09.108+0800: 685.217: Total time for which application threads were stopped: 0.0186916 seconds, Stopping threads took: 0.0002732 seconds
2018-08-10T09:08:09.110+0800: 685.218: Total time for which application threads were stopped: 0.0014087 seconds, Stopping threads took: 0.0002211 seconds
2018-08-10T09:08:09.111+0800: 685.219: Total time for which application threads were stopped: 0.0008249 seconds, Stopping threads took: 0.0001439 seconds
2018-08-10T09:08:09.784+0800: 685.893: Total time for which application threads were stopped: 0.0013206 seconds, Stopping threads took: 0.0002583 seconds
2018-08-10T09:08:10.785+0800: 686.894: Total time for which application threads were stopped: 0.0008028 seconds, Stopping threads took: 0.0001839 seconds
2018-08-10T09:08:11.473+0800: 687.582: Total time for which application threads were stopped: 0.0014312 seconds, Stopping threads took: 0.0003181 seconds
2018-08-10T09:08:12.475+0800: 688.584: Total time for which application threads were stopped: 0.0012153 seconds, Stopping threads took: 0.0002318 seconds
2018-08-10T09:08:13.476+0800: 689.585: Total time for which application threads were stopped: 0.0012668 seconds, Stopping threads took: 0.0002425 seconds
2018-08-10T09:08:14.478+0800: 690.586: Total time for which application threads were stopped: 0.0014480 seconds, Stopping threads took: 0.0002702 seconds
2018-08-10T09:08:15.479+0800: 691.588: Total time for which application threads were stopped: 0.0012542 seconds, Stopping threads took: 0.0002617 seconds
2018-08-10T09:08:16.481+0800: 692.589: Total time for which application threads were stopped: 0.0012593 seconds, Stopping threads took: 0.0002505 seconds
2018-08-10T09:08:17.941+0800: 694.049: Total time for which application threads were stopped: 0.0014642 seconds, Stopping threads took: 0.0003231 seconds
2018-08-10T09:08:17.948+0800: 694.056: Total time for which application threads were stopped: 0.0007652 seconds, Stopping threads took: 0.0001269 seconds
2018-08-10T09:08:18.616+0800: 694.725: Total time for which application threads were stopped: 0.0012470 seconds, Stopping threads took: 0.0002004 seconds
2018-08-10T09:08:19.618+0800: 695.727: Total time for which application threads were stopped: 0.0015847 seconds, Stopping threads took: 0.0003592 seconds
2018-08-10T09:08:19.622+0800: 695.731: Total time for which application threads were stopped: 0.0008858 seconds, Stopping threads took: 0.0002116 seconds
2018-08-10T09:08:19.623+0800: 695.731: Total time for which application threads were stopped: 0.0007484 seconds, Stopping threads took: 0.0001365 seconds
2018-08-10T09:08:20.140+0800: 696.249: [GC (Allocation Failure) 2018-08-10T09:08:20.141+0800: 696.249: [ParNew
Desired survivor size 100302848 bytes, new threshold 6 (max 6)
- age   1:   58251944 bytes,   58251944 total
- age   2:   11976168 bytes,   70228112 total
- age   3:     192136 bytes,   70420248 total
- age   4:    2424328 bytes,   72844576 total
- age   5:    1161232 bytes,   74005808 total
- age   6:    1639936 bytes,   75645744 total
 
监控中就是Index Memory - Lucene一直处于增长状态,

QQ截图20180810122323.jpg

 
 
除了更改了JVM Heap内存为6g,bootstrap.memory_lock: true其它的配置都是默认
 
系统更改
* soft   nofile   32768
* hard nofile 65536
* soft noproc 65536
* hard noproc 131072
* soft memlock unlimited
* hard memlock unlimited
 
ulimit
 
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127055
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127055
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
 
 
求大神帮忙分析是哪里的原因,单机配置。
 
 
已邀请:

zqc0512 - andy zhou

赞同来自:

数据量有多大,JVM调整大点。
openfile 也加大些。

tt11403

赞同来自:

楼主,你这个GC图形是用什么显示的?

wanghao20

赞同来自:

问题找到了,centos里的/etc/security/limits.d/90-nproc.conf里的非root用户nproc 设置的4096小了

要回复问题请先登录注册