查询效率和mapping的关系

Elasticsearch | 作者 elasticStack | 发布于2019年01月11日 | 阅读数:387

1.es版本5.6.9
2.问题:最近使用了kibana的profile工具查看了dsl的查询时间,发现了主要的耗时集中在两个字段上,具体见贴图
3.疑问:请问如果想优化查询效率的话, dsl本身可以优化的地方还有吗, 还有就是这两个字段的mapping会直接影响查询效率吗?

datatime耗时.png


不同索引耗时不同.png


分片耗时不同.png
已邀请:

elasticStack - 90后it大数据男

赞同来自:

{
"query": {
"bool": {
"filter": [{
"range": {
"datatime": {
"gte": "1547086989",
"lte": "1547173388"
}
}
}, {
"term": {
"comid": {
"value": "196e879175c477e89f5d"
}
}
}, {
"terms": {
"group": [1, 2, 74, 187, 274, 275, 276, 278, 279, 401, 403, 407, 722, 723, 724, 725, 726, 774]
}
}, {
"terms": {
"datatype": ["shell_log", "net_connect", "access_log", "account_change", "proc_create", "dns_access"]
}
}, {
"bool": {
"must": [{
"wildcard": {
"uname.keyword": {
"value": "r*t"
}
}
}]
}
}]
}
},
"_source": {
"includes": []
},
"sort": [{
"datatime": {
"order": "desc"
}
}],
"from": "0",
"size": "100",
"highlight": {
"require_field_match": "true",
"pre_tags": ["[qthighlight]"],
"post_tags": ["[/qthighlight]"],
"fields": {
"event": {},
"tty": {},
"danger": {},
"host_memo": {},
"group_name": {},
"pre_sudo_shell": {},
"uname": {},
"sudo_shell": {},
"path": {},
"comid": {},
"event_type": {},
"pre_login_shell": {},
"login_err_reason": {},
"group": {},
"pre_gid": {},
"ppuname": {},
"location": {},
"gid": {},
"ppid": {},
"type": {},
"shell": {},
"login_con_port": {},
"host_tag": {},
"pre_gname": {},
"pname": {},
"euid": {},
"log_type": {},
"event_search": {},
"euname": {},
"cmd": {},
"internal_ip": {},
"datatype": {},
"host_name": {},
"logout_reason": {},
"dst_ip": {},
"os_type": {},
"sudo_uname": {},
"datatime": {},
"uid": {},
"ppname": {},
"pid": {},
"gname": {},
"home": {},
"id": {},
"category": {},
"ppuid": {},
"event_type_des": {},
"severity": {},
"proto": {},
"port": {},
"src_ip": {},
"pre_home": {},
"ip_type": {},
"uname.keyword": {},
"status": {},
"agent_ip": {},
"pre_uid": {},
"login_type": {},
"pre_sudo_uname": {},
"agent_id": {},
"src_port": {},
"external_ip": {},
"pre_uname": {},
"rule": {},
"dst_port": {},
"action": {},
"pppath": {},
"rule_id": {}
}
}
}
DSL

elasticStack - 90后it大数据男

赞同来自:

 "datatime": {
"type": "date",
"format": "strict_date_optional_time || epoch_second"
},
"comid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 8191
}
}
}
mapping

elasticStack - 90后it大数据男

赞同来自:

datatime范围是最近三十天

laoyang360 - [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net

赞同来自:

1、terms对于数据量级比较大的时候,建议改成keyword类型以提升效率;
2、wildcard能不用就不用,看看可以通过match_phrase或者其他方式替代,极限情况可能导致宕机;
3、确认你的那么多字段都需要高亮吗?

rochy - rochy_he@tw

赞同来自:

"term": { "comid": { "value": "196e879175c477e89f5d" } }
改成
"term": { "comid.keyword": { "value": "196e879175c477e89f5d" } }
试试看

God_lockin

赞同来自:

1. 所有terms的内容都可以考虑用keyword来做mappings,写了那么多应该是某种枚举类型吧
2. comid你这是直接dynamic mapping出来的吧,建议不要这样用,效率会很差。实在需要支持多种检索的话可以考虑这样的mapping
"title": {
"type": "text",
"fields": {
"text": {
"type": "text",
"search_analyzer": "ik_smart",
"analyzer": "ik_max_word"
},
"keyword": {
"type": "keyword"
}
}
}
3. 那么一大堆的highlight的东西,其实可以考虑在代码里面写匹配和替换的,甚至直接在前端遍历让他们亮起来,不一定要追求直接让ES把所有的都拼好
4. 数据量如果很大的话可以考虑通过时间、某个枚举字段…的方式分索引做router,在更小的数据池里捞数据不会更快一点吗?

elasticStack - 90后it大数据男

赞同来自:

comid.keyword_.jpg

改了comid.keyword, 效果好像不太行, 我有个疑问, 问什么range-time每次消耗的时间会那么多@rochy

要回复问题请先登录注册