即使是不成熟的尝试,也胜于胎死腹中的策略。

Elasticsearch如何对高亮词进行精确显示

Elasticsearch | 作者 zhouliang | 发布于2016年05月09日 | 阅读数:18552

我本地的es版本,是1.5.2,对数据建立索引时,并没有指定index_options的类型。(默认,应该是plain)
本地put条数据,大致的样子,是:
{"id": "20001",
"name": "江苏大学附属医院"}
查询语句:
curl localhost:9200/**-index/**/_search?pretty -d '{"query":{"bool":{"should":[{"match":{"name":{"query":"江苏人民医院","type":"boolean","boost":12.0}}}]}},"highlight":{"fields":{"name":{}}},"from":0,"size":2}'。

我的预期,对于hightlight部分,结果是:<em>江苏</em>省<em>人民医院</em>。
而实际结果是:<em>江苏省人员医院</em>,这个“省”字,也高亮了,与我预期不符。问题出在哪里?
已邀请:

qinpengfei - 一个连电脑都玩不明白的逗逼

赞同来自:

1. highlight可以设置返回好几条的高亮字段,但是高亮的字段不可能和你预期的一模一样
2. highlight里面也可以写高亮的查询语句,看看能不能满足 highlight_query(https://www.elastic.co/guide/e ... g.html)

flank

赞同来自:

检查下mapping,是否启用name这个字段的分词

zhouliang

赞同来自:

该问题,通过一个简单的方式解决掉了:
在建立索引时,给name字段指定index_options类型,即:"index_options":"offsets"。

If index_options is set to offsets in the mapping the postings highlighter will be used instead of the plain highlighter. The postings highlighter:

Is faster since it doesn’t require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be
Requires less disk space than term_vectors, needed for the fast vector highlighter
Breaks the text into sentences and highlights them. Plays really well with natural languages, not as well with fields containing for instance html markup
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm

要回复问题请先登录注册