ElasticSearch --should matchPhraseQuery查询返回数据速度慢问题

Elasticsearch | 作者 linfujian | 发布于2018年09月21日 | 阅读数：5431

分享到：QQ空间新浪微博微信 QQ好友印象笔记有道云笔记

系统提示：这个人太懒了，什么问题描述都没有写！

4 个回复

rochy - rochy_he

赞同来自: linfujian

你可以看返回的结果里面的 took 的大小，如果 took 很小则说明你可能是请求了大量的数据，慢在了数据传输上；
如果 took 数值很大，则要看你的数据量、机器配置等信息了。

rochy - rochy_he

赞同来自: linfujian

首先说一下你的查询，推荐你的 mesh_id 不进行分词（数据类型为：keyword），这样你就可以使用 termsQuery 来替换 matchPhraseQuery，这样搜索速度会快很多；

第二，一次获取十几万条数据，不知道数据量有多少 M ，如果数据量很大，确实会稍微慢一些；

不过看起来你的问题应该是我说的第一条原因

laoyang360 - 《一本书讲透Elasticsearch》作者，Elastic认证工程师 [死磕Elasitcsearch]知识星球地址：http://t.cn/RmwM3N9；微信公众号：铭毅天下; 博客：https://elastic.blog.csdn.net

赞同来自: linfujian

在索引query前面加上：“profile：true”，看一下相关执行时间，根据结果做进一步分析即可。

linfujian

我有一个index 数据格式如下：

{

      "_index" : "disease_pmid",

      "_type" : "disease_pmid_list",

      "_id" : "1874890",

      "_score" : 1.0,

      "_source" : {

        "pmid" : "13702243",

        "mesh_id" : "MESH:D014202",

        "mentions" : "tremor",

        "resource" : "nlp|MESH",

        "mesh_name" : "Tremor"

      }

数据量为3993万条document。

我的需求是根据用户输入匹配另一个index得到匹配的mesh_id（为一个集合）, 然后再根据mesh_id匹配上表所有满足的documents，并返回。是用spring-data-elastic开发的，主要逻辑代码如下（某一次查询的中间过程已用注释标出）：

            BoolQueryBuilder boolQueryBuilder2 = QueryBuilders.boolQuery();

			for(String meshId : meshIds) { //214个mesh_id

				boolQueryBuilder2.should(QueryBuilders.matchPhraseQuery("mesh_id", meshId));

			}

			

			SearchQuery searchQuery2 = new NativeSearchQueryBuilder()

					.withQuery(boolQueryBuilder2)

					.withIndices("disease_pmid")

					.withTypes("disease_pmid_list")

					.build();

			

			//返回最多十万条记录及总记录

			//本次查询匹配了60万条documents，只截取了前10万条document返回

			totalEntity = esUtil.queryNAndTotalNum(searchQuery2, 100000, Pmid2DiseaseEntity.class);

如上的查询时间在50s左右，这个还是只截取了前10万条documents的查询速度，esUtil.queryNAndTotalNum方法如下：

public <T> TotalNumAndEntities<T> queryNAndTotalNum(SearchQuery searchQuery, Integer n, Class<T> T) {

		

		List<T> entities = new ArrayList<>();

		long totalNum = 0L;

		TotalNumAndEntities<T> result = new TotalNumAndEntities<>();

		

		String scrollId = scan(searchQuery, 5000l, false);



		boolean hasRecords = true;

		while (hasRecords) {

			SearchResponse searchResponse = getClient().prepareSearchScroll(scrollId).

					setScroll(new TimeValue(5000l)).execute().actionGet();

			

			Page<T> page = getResultsMapper().mapResults(searchResponse, T, null);

			if(page.hasContent()) {

				entities.addAll(page.getContent());

				if(entities.size() == n) {

					hasRecords = false;

					totalNum = searchResponse.getHits().totalHits();

				}

				scrollId = searchResponse.getScrollId();

			} else {

				hasRecords = false;

				totalNum = searchResponse.getHits().totalHits();

			}

			

		}

		

		clearScroll(scrollId);

		

		result.setNum(totalNum);

		result.setEntites(entities);

		return result;

		

	}

问题是：这种在数据量大的index中查询一次返回几十万documents的查询应该如何优化其速度呢？谢谢各位

要回复问题请先登录或注册

ElasticSearch --should matchPhraseQuery查询返回数据速度慢问题

系统提示：这个人太懒了，什么问题描述都没有写！

4 个回复

发起人

活动推荐

相关问题

问题状态

ElasticSearch --should matchPhraseQuery查询返回数据速度慢问题

系统提示：这个人太懒了，什么问题描述都没有写！

与内容相关的链接

4 个回复

发起人

活动推荐

相关问题

问题状态