日志过长(whose UTF8 encoding is longer than the max length 32766)无法导入es
Elasticsearch | 作者 Lincoln | 发布于2017年12月29日 | 阅读数:19753
2017最后一问,期待大神指导
- 问题描棕
- elkstack架构: filebeat -> logstash -> es ->kibana
- 数据源: 生产系统.log文件
- 报错信息:因过日志长度大于32766,导致无法导入es
- logstash配置信息:即在es中每天生成一个索引文件,例如index-2017.12.29
input {
beats {
port => 5050
}
}
output {
if [type] == "type1" {
elasticsearch {
hosts => ["host1:9200","host2:9200"]
manage_template => false
index => "index-%{+YYYY.MM.dd}"
document_type => "log"
}
}
}
- 报错信息:"Document contains at least one immense term in field=\"message\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.
[2017-12-27T11:37:29,299][WARN ][logstash.outputs.elasticsearch] Bind":false},"371500":{"code":"371500","name":"èŠåŸŽ","cardName":"èŠåŸŽå¸‚社会ä¿éšœå¡","finance":false,"pinyin":"-","moduleList":,"default":false,"online":true,"supportBind":true},
......
"code":"371500","name":"郴州","cardName":"郴州市社会ä¿éšœå¡","finance":false,"pinyin":"-","moduleList":,"default":false,"online":true,"supportBind":true}}], :response=>{"index"=>{"_index"=>"citymain-2017.12.27", "_type"=>"log", "_id"=>"AWCWC-BRzLUL94yyTeO2", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Document contains at least one immense term in field=\"message\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[50, 48, 49, 55, 45, 49, 50, 45, 50, 55, 32, 49, 48, 58, 53, 50, 58, 48, 56, 46, 53, 52, 51, 32, 32, 73, 78, 70, 79, 32]...', original message: bytes can be at most 32766 in length; got 33023", "caused_by"=>{"type"=>"max_bytes_length_exceeded_exception", "reason"=>"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 33023"}}}}}
- 尝试解决,未成功
- es官网ignore_above参数<[url=https://www.elastic.co/guide/en/elasticsearch/reference/5.4/ignore-above.html#ignore-above>]https://www.elastic.co/guide/e ... gt%3B[/url]
- 尝试对index做mapping处理未成功:对message处理,对已经存在的无法生效;对新索引(例如明天index-2017.12.29),无法正常输入新索引文件
{
"mappings": {
"log": {
"properties": {
"message": {
"type": "keyword",
"ignore_above": 200
}
}
}
}
}
- 问题概要:通过logstash每天生成索引文件(index => "index-%{+YYYY.MM.dd}"),因部分日志过长,导致过长的日志无法输入到es,过长的日志在logstash中直接跳过。
- 怎么通过elkstack架构(filebeat -> logstash -> es ->kibana),实现过长日志(>32766)正常导入es,并且每天生成一个索引文件?
9 个回复
laoyang360 - 《一本书讲透Elasticsearch》作者,Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net
赞同来自: weizijun
2、设置ignore_above后,超过给定长度后的数据将不被索引,无法通过term精确匹配检索返回结果。
3、text对字符长度没有限制。
这是做过的验证:https://blog.csdn.net/laoyang3 ... 07980
参考一下。
Lincoln - 80s
赞同来自:
参照stackoverfollow Using logstash to index those long messages, I use this filter to truncate the long string
puyunjiafly
赞同来自:
2:你储存messagewsm要用keyword 用text不是更好?
设置keyword的ignore_above应该能解决你的问题 ,如果设置了还出这个异常 请帖下 具体信息。
medcl - 今晚打老虎。
赞同来自:
redhat
赞同来自:
rockybean - Elastic Certified Engineer, ElasticStack Fans,公众号:ElasticTalk
赞同来自:
很明显message 这种字段不要用 keyword 来存储,这样也会导致 term 过多,占用过多内存。肯定要用 text 类型的
zqc0512 - andy zhou
赞同来自:
nathon
赞同来自:
nathon
赞同来自: