使用 shuf 来打乱一个文件中的行或是选择文件中一个随机的行。

ElasticSearch2.4.6 + Spring-data filter功能不正常

Elasticsearch | 作者 linfujian | 发布于2018年09月07日 | 阅读数:3453

我有一个document如下图所示:
{
"_index" : "var_pmid",
"_type" : "var_pmid_list",
"_id" : "79462",
"_score" : 10.701287,
"_source" : {
"mutation_id" : "",
"pmid" : "25996639",
"chr_id" : "chr5:g.36958288A>G",
"cdna" : "c.313A>G",
"gene" : "",
"kinds" : "",
"nlp" : "",
"paper" : "",
"resource" : "",
"snp" : "rs376768802",
"snpeff_ann" : "c.-31-28915T>A",
"var" : "",
"var_s" : "p.N105D",
"issn" : "1098-3600",
"pubDate" : "2016-10-20",
"IF" : 8.229
}


我的代码过滤逻辑如下:
 
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder
.should(QueryBuilders.matchPhraseQuery("paper", var))
.should(QueryBuilders.matchPhraseQuery("snpeff_ann", var))
.should(QueryBuilders.matchPhraseQuery("chr_id", var))
.should(QueryBuilders.matchPhraseQuery("snp", var))
.should(QueryBuilders.matchPhraseQuery("var_s", var))
.should(QueryBuilders.matchPhraseQuery("var", var))
.should(QueryBuilders.matchPhraseQuery("mutation_id", var))
.should(QueryBuilders.matchPhraseQuery("cdna", var))
.should(QueryBuilders.matchPhraseQuery("clinvarID", var));

//加过滤条件
BoolQueryBuilder filterBuilder = QueryBuilders.boolQuery();
String ifStr = {"5","10"};
filterBuilder
.filter(QueryBuilders.rangeQuery("IF").gte(Float.parseFloat(ifStr[0])).lte(Float.parseFloat(ifStr[1])));

SearchQuery searchQuery2 = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.filteredQuery(boolQueryBuilder, filterBuilder))
.withIndices("var_pmid")
.withTypes("var_pmid_list")
.build();

List<Var2PmidEntity> entities2 = esUtil.queryAll(searchQuery2, Var2PmidEntity.class);


//如下是queryAll方法
public <T> List<T> queryAll(SearchQuery searchQuery, Class<T> T) {

String scrollId = scan(searchQuery, 5000l, false);

List<T> entities = new ArrayList<>();
boolean hasRecords = true;
while (hasRecords) {
SearchResponse searchResponse = getClient().prepareSearchScroll(scrollId).
setScroll(new TimeValue(5000l)).execute().actionGet();

Page<T> page = getResultsMapper().mapResults(searchResponse, T, null);
if(page.hasContent()) {
entities.addAll(page.getContent());
scrollId = searchResponse.getScrollId();
} else {
hasRecords = false;
}
}

clearScroll(scrollId);

return entities;

}


var为c.313A>G
应该能检索到第一条document的,但是就是检索不到,请大神赐教
已邀请:

linfujian

赞同来自:

    没有加filter过滤条件前是可以查询到c.313A>G的所有document的,filter过滤IF 0-3;3-5也是正常的,但5-10就过滤不出来存在的数据了(如上第一个图),20-300过滤出了IF=3.806的数据,错误过滤。
    附件一句:document的IF是后来更新添加的新字段,之前保存的是String类型,后来update成了float类型

linfujian

赞同来自:

查了下 stackoverflow string类型和numeric类型的数据在range 对比时分别用了不同的range query,但是我改成了float。是不是update后还要进行别的操作才能起作用?

linfujian

赞同来自:

@rochy:好吧,一时失误,几千万的数据要冗余一个field了

要回复问题请先登录注册