绊脚石乃是进身之阶。

Elasticsearch之matchPhraseQuery如何正常解析特殊字符

Elasticsearch | 作者 linfujian | 发布于2018年10月17日 | 阅读数:6677

我有一个索引数据,格式如下:
{
"var_pmid" : {
"aliases" : { },
"mappings" : {
"var_pmid_list" : {
"properties" : {
"cdna" : {
"type" : "string"
},
"chr_id" : {
"type" : "string"
},
"clinvarID" : {
"type" : "string"
},
"gene" : {
"type" : "string"
},
"mutation_id" : {
"type" : "string"
},
"paper" : {
"type" : "string"
},
"pmid" : {
"type" : "string"
},
"resource" : {
"type" : "string"
},
"snp" : {
"type" : "string"
},
"snpeff_ann" : {
"type" : "string"
},
"var" : {
"type" : "string"
},
"var_s" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1536392419200",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "DKmAfA1oTjSjnF7ZH8qDzA",
"version" : {
"created" : "2040699"
}
}
},
"warmers" : { }
}
}
其中的 snpeff_ann的value有   "snpeff_ann" : "c.2326-49876C>T,c.1472A>G,n.1383-49876C>T,c.3445-49876C>T,n.4417-49876C>T" 这种。我需要用c.1472A>G去部分匹配。所以我用了spring-data-elasticsearch的QueryBuilders.matchPhraseQuery去匹配。但是可能是由于‘.>’等特殊字符的原因,返回的结果有不正确的。这种如何去处理呢?我的匹配代码如下:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder
.should(QueryBuilders.matchPhraseQuery("paper", var))
.should(QueryBuilders.matchPhraseQuery("snpeff_ann", var))
.should(QueryBuilders.matchPhraseQuery("chr_id", var))
.should(QueryBuilders.matchPhraseQuery("snp", var))
.should(QueryBuilders.matchPhraseQuery("var_s", var))
.should(QueryBuilders.matchPhraseQuery("var", var))
.should(QueryBuilders.matchPhraseQuery("mutation_id", var))
.should(QueryBuilders.matchPhraseQuery("cdna", var))
.should(QueryBuilders.matchPhraseQuery("clinvarID", var));

SearchQuery searchQuery2 = new NativeSearchQueryBuilder()
.withQuery(boolQueryBuilder)
.withIndices("var_pmid")
.withTypes("var_pmid_list")
.build();

 
已邀请:

rochy - rochy_he

赞同来自: linfujian

你说的情况应该是默认的分词器启用了停用词造成特殊字符被移除了,你可以关闭停用词,或者自定义新的分词器即可:
{
"settings": {
"analysis": {
"analyzer": {
"my_std_analyzer": {
"type": "standard",
"stopwords": "_none_"
}
}
}
}
}

要回复问题请先登录注册