有没大佬试过pipeline在reindex里用?

Elasticsearch | 作者 God_lockin | 发布于2019年03月08日 | 阅读数:245

目前我们的数据里需要添加一些字段,但是原来的数据结构里面这些字段是缺失的,所以我准备通过pipeline的方式把这些默认值填上
 
{
"mappings": {
"_doc": {
"properties": {
"sentiment": {
"type": "keyword"
},
"ingest_timestamp": {
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis",
"type": "date"
},
……
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 9,
"default_pipeline": "defaultSentimentConditionPipeline"
}
}
然后通过一组的pipeline去设值
{
"defaultSentimentConditionPipeline": {
"description": "set default sentiment condition pipeline",
"processors": [
{
"pipeline": {
"if": "ctx.sentiment == '' || ctx.sentiment == null",
"name": "defaultSentimentPipeline"
}
},
{
"pipeline": {
"if": "ctx.ingest_timestamp == '' || ctx.ingest_timestamp == null",
"name": "timestamp_pipeline"
}
}
]
},
"defaultSentimentPipeline": {
"description": "set default sentiment",
"processors": [
{
"set": {
"field": "sentiment",
"value": "sentiment"
}
}
]
},
"timestamp_pipeline": {
"description": "Add insert timestamp",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
}
但是貌似reindex进来的数据并没有带上时间戳和默认值,理论上reindex是es内部的scroll+bulkinsert,但是这个流程是不走pipeline的吗?
已邀请:

rochy - rochy_he@tw

赞同来自: God_lockin

比可以通过 reindex 时候添加脚本来完成上述的操作
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter",
"version_type": "external"
},
"script": {
"source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')}",
"lang": "painless"
}
}

或者通过指定 pipeline 名称来实现上述效果:
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "some_ingest_pipeline"
}
}

 
 
 

要回复问题请先登录注册