愚者求师之过,智者从师之长。
logstash 自定义  mapping

logstash 自定义 mapping

用logstash导入ES且自定义mapping时踩的坑

Logstashjiakechong1642 发表了文章 • 5 个评论 • 13247 次浏览 • 2017-04-26 16:14 • 来自相关话题

问题发生背景: 1.本来我是使用logstash的默认配置向ES导入日志的。然后很嗨皮,发现一切OK,后来我开始对日志进行聚合统计,发现terms聚合时的key很奇怪,后来查询这奇怪的key,发现这些关键字都是源字符串的一段,而且全部复现场景都是出现"xxxx-xxxxxx"时就会截断,感觉像是分词器搞的鬼。所以想自己定制mapping。下面是原来的logstash配置
output{
elasticsearch{
action => "index"
hosts => ["xxxxxx:9200"]
index => "xxxxx"
document_type => "haha"
}
}
说干就干: 开始四处查阅文档,发现可以定制mapping,很开心。
output{
elasticsearch{
action => "index"
hosts => ["xxx"]
index => "logstashlog"
template => "xx/http-logstash.json"
template_name => "http-log-logstash"
template_overwrite => true
}
stdout{
codec => rubydebug
}
}没有什么一帆风顺: 问题1: 但是我发现我已经上传了自定义的template,但是就是不能生效。 这时知道了,这个要设置order才能覆盖,默认的order是0,必须更大才行,参考 http://elasticsearch.cn/article/21 问题2: 我看到自己上传的template的order已经是1了,怎么还是不生效呢? 原来自己的索引名称不匹配自己的template的名称,所以不能使用,就又用了默认的template。 改成下面后OK,终于生效了。(注意index名称变化)output{
elasticsearch{
action => "index"
hosts => ["xxx"]
index => "http-log-logstash"
document_type => "haha"
template => "xxx/http-logstash.json"
template_name => "http-log-logstash"
template_overwrite => true
}
stdout{
codec => rubydebug
}
}问题3: 发现导入失败,原来自己的时间字符串不能用默认的date的format匹配, 如2017-04-11 00:07:25   不能用 { "type" : "date"} 的默认format匹配, 改成:"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},  这样就能解析了。 一切OK,谢谢社区,谢谢Google(你是我见过的除了书籍和老师之后最提升生产力的工具) 附上我的模板
{ 
    "template" : "qmpsearchlog", 
    "order":1,
    "settings" : { "index.refresh_interval" : "60s" }, 
    "mappings" : { 
        "_default_" : { 
            "_all" : { "enabled" : false }, 
            "dynamic_templates" : [{ 
              "message_field" : { 
                "match" : "message", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }, { 
              "string_fields" : { 
                "match" : "*", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }], 
            "properties" : { 
                "@timestamp" : { "type" : "date"}, 
                "@version" : { "type" : "integer", "index" : "not_analyzed" }, 
                "path" : { "type" : "string", "index" : "not_analyzed" }, 
				"host" : { "type" : "string", "index" : "not_analyzed" },
                "record_time":{"type":"date","format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}, 
                "method":{"type":"string","index" : "not_analyzed"},
                "unionid":{"type":"string","index" : "not_analyzed"},
                "user_name":{"type":"string","index" : "not_analyzed"},
                "query":{"type":"string","index" : "not_analyzed"},
                "ip":{ "type" : "ip"}, 
                "webbrower":{"type":"string","index" : "not_analyzed"},
                "os":{"type":"string","index" : "not_analyzed"},
                "device":{"type":"string","index" : "not_analyzed"},
                "ptype":{"type":"string","index" : "not_analyzed"},
                "serarch_time":{"type":"date","format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
                "have_ok":{"type":"string","index" : "not_analyzed"},
                "legal":{"type":"string","index" : "not_analyzed"}
            } 
        } 
    } 
}
 

用logstash导入ES且自定义mapping时踩的坑

Logstashjiakechong1642 发表了文章 • 5 个评论 • 13247 次浏览 • 2017-04-26 16:14 • 来自相关话题

问题发生背景: 1.本来我是使用logstash的默认配置向ES导入日志的。然后很嗨皮,发现一切OK,后来我开始对日志进行聚合统计,发现terms聚合时的key很奇怪,后来查询这奇怪的key,发现这些关键字都是源字符串的一段,而且全部复现场景都是出现"xxxx-xxxxxx"时就会截断,感觉像是分词器搞的鬼。所以想自己定制mapping。下面是原来的logstash配置
output{
elasticsearch{
action => "index"
hosts => ["xxxxxx:9200"]
index => "xxxxx"
document_type => "haha"
}
}
说干就干: 开始四处查阅文档,发现可以定制mapping,很开心。
output{
elasticsearch{
action => "index"
hosts => ["xxx"]
index => "logstashlog"
template => "xx/http-logstash.json"
template_name => "http-log-logstash"
template_overwrite => true
}
stdout{
codec => rubydebug
}
}没有什么一帆风顺: 问题1: 但是我发现我已经上传了自定义的template,但是就是不能生效。 这时知道了,这个要设置order才能覆盖,默认的order是0,必须更大才行,参考 http://elasticsearch.cn/article/21 问题2: 我看到自己上传的template的order已经是1了,怎么还是不生效呢? 原来自己的索引名称不匹配自己的template的名称,所以不能使用,就又用了默认的template。 改成下面后OK,终于生效了。(注意index名称变化)output{
elasticsearch{
action => "index"
hosts => ["xxx"]
index => "http-log-logstash"
document_type => "haha"
template => "xxx/http-logstash.json"
template_name => "http-log-logstash"
template_overwrite => true
}
stdout{
codec => rubydebug
}
}问题3: 发现导入失败,原来自己的时间字符串不能用默认的date的format匹配, 如2017-04-11 00:07:25   不能用 { "type" : "date"} 的默认format匹配, 改成:"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},  这样就能解析了。 一切OK,谢谢社区,谢谢Google(你是我见过的除了书籍和老师之后最提升生产力的工具) 附上我的模板
{ 
    "template" : "qmpsearchlog", 
    "order":1,
    "settings" : { "index.refresh_interval" : "60s" }, 
    "mappings" : { 
        "_default_" : { 
            "_all" : { "enabled" : false }, 
            "dynamic_templates" : [{ 
              "message_field" : { 
                "match" : "message", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }, { 
              "string_fields" : { 
                "match" : "*", 
                "match_mapping_type" : "string", 
                "mapping" : { "type" : "string", "index" : "not_analyzed" } 
              } 
            }], 
            "properties" : { 
                "@timestamp" : { "type" : "date"}, 
                "@version" : { "type" : "integer", "index" : "not_analyzed" }, 
                "path" : { "type" : "string", "index" : "not_analyzed" }, 
				"host" : { "type" : "string", "index" : "not_analyzed" },
                "record_time":{"type":"date","format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}, 
                "method":{"type":"string","index" : "not_analyzed"},
                "unionid":{"type":"string","index" : "not_analyzed"},
                "user_name":{"type":"string","index" : "not_analyzed"},
                "query":{"type":"string","index" : "not_analyzed"},
                "ip":{ "type" : "ip"}, 
                "webbrower":{"type":"string","index" : "not_analyzed"},
                "os":{"type":"string","index" : "not_analyzed"},
                "device":{"type":"string","index" : "not_analyzed"},
                "ptype":{"type":"string","index" : "not_analyzed"},
                "serarch_time":{"type":"date","format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
                "have_ok":{"type":"string","index" : "not_analyzed"},
                "legal":{"type":"string","index" : "not_analyzed"}
            } 
        } 
    } 
}