身安不如心安,屋宽不如心宽 。

如何用java整合es实现词云统计?

Elasticsearch | 作者 hongsir | 发布于2019年12月13日 | 阅读数:3892

如果用java整合es实现词频统计?
我目前的实现方案:
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
queryBuilder.withQuery(QueryBuilders.matchAllQuery());
queryBuilder.addAggregation(AggregationBuilders.terms("hotWord").field(fieldName).size(10));
AggregatedPage<AnalyzerIndex> aggPage = (AggregatedPage<AnalyzerIndex>) indexRepository.search(queryBuilder.build());
Terms trem = (Terms) aggPage.getAggregation("hotWord");
AtomicInteger i = new AtomicInteger(1);
trem.getBuckets().forEach(bucket -> {
System.out.println(i.get() + ":" + bucket.getKey() + "=" + bucket.getDocCount());
i.getAndIncrement();
});
已邀请:

hongsir

赞同来自:

这是单个实现的方案:
TermVectorsRequest request = new TermVectorsRequest(index, indexType, id);
request.setFields("content");
request.setFieldStatistics(true);
request.setTermStatistics(true);
request.setPositions(true);
request.setOffsets(true);
request.setPayloads(false);

Map<String, Integer> filterSettings = new HashMap<>();
filterSettings.put("max_num_terms", 10);//词云数量
filterSettings.put("min_term_freq", 2);//在当前文档词的频率
filterSettings.put("max_term_freq", 100);
filterSettings.put("min_doc_freq", 1);//索引中有几个记录出现
filterSettings.put("max_doc_freq", 100);
filterSettings.put("min_word_length", 2);
filterSettings.put("max_word_length", 10);
request.setFilterSettings(filterSettings);
TermVectorsResponse response = elasticsearchTemplate.getClient().termvectors(request, RequestOptions.DEFAULT);
List<TermVectorsResponse.TermVector> termVectorList = response.getTermVectorsList();

for (TermVectorsResponse.TermVector termVector : termVectorList) {
String fieldName = termVector.getFieldName();
TermVectorsResponse.TermVector.FieldStatistics fieldStatistics = termVector.getFieldStatistics();
List<TermVectorsResponse.TermVector.Term> terms = termVector.getTerms();
for (TermVectorsResponse.TermVector.Term term : terms) {
//+ "--" + term.getTokens()
System.out.println("----term:" + term.getTerm() + "  -DocFreq:" + term.getDocFreq() + "  -TermFreq:" + term.getTermFreq());
//term.getTokens().forEach(s -> System.out.println("----" + s.));
}
}
有没有用springboot整合过es的大神,指点指点

axxc

赞同来自:

GET my_index/_search
{
  "aggs": {
    "terms_text": {
      "terms": {
        "field": "text",
        "size": 100,
        "min_doc_count": 2, 
        "order": {
          "_term": "desc"
        }
      }
    }
  },
  "_source": {
    "excludes": "text"
  }
}

要回复问题请先登录注册