使用 dmesg 来查看一些硬件或驱动程序的信息或问题。

elasticsearch中mapping定义中norms字段的疑问?

Elasticsearch | 作者 hapjin | 发布于2019年05月31日 | 阅读数:8715

看官网对norms描述:
if you don’t need scoring on a specific field, you should disable norms on that field. In particular, this is the case for fields that are used solely for filtering or aggregations.
 
不需要对某字段进行打分排序时,可禁用norms。换句话说,只有type 为 "text" 的字段,才有必要设置 norms 属性吧? (norms 默认为true) 
 
而对于 keyword 类型的字段,其实是没有 norms 属性的吧?看官网对keyword的解释:
they are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.
es6.3.2 测试了一下:
PUT test
{
"settings": {
"index":{
"number_of_shards":2,
"number_of_replicas":0
}
}
, "mappings": {
"_doc":{
"properties":{
"title":{"type":"text","norms":false},
"overview":{"type":"keyword","norms":false}
}
}
}
}

GET test/_mapping  返回如下:
{
"test": {
"mappings": {
"_doc": {
"properties": {
"overview": {
"type": "keyword"
},
"title": {
"type": "text",
"norms": false
}
}
}
}
}
}

 
 
已邀请:

Ombres

赞同来自: ridethewind Esmmmmmmmm

1. 不需要对某字段进行打分排序时,可禁用norms。换句话说,只有type 为 "text" 的字段,才有必要设置 norms 属性吧?
你的理解是对的。
2. 而对于 keyword 类型的字段,其实是没有 norms 属性的吧?
keyword类型是有norms属性的,默认是false。在初始化的时候设置了,以下引用部分源码。
public static final MappedFieldType FIELD_TYPE = new KeywordFieldType();
static {
FIELD_TYPE.setTokenized(false);
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
FIELD_TYPE.freeze();
}

hapjin

赞同来自:

norms 到底存储的是什么东西?


Norms store various normalization factors that are later used at query time in order to compute the score of a document relatively to a query.


这里说:norms里面存储的是各种各样的归一化因子。看 bm25-the-next-generation-of-lucene-relevation,应该是 与文档平均长度(average length of a document)有关的因子吧?
 
norms又是如何影响搜索结果的呢?比如说将:norms 与 term frequency 对比,一般来说tf越大,文档得分越高,那 norms 是怎么影响得分?


index-time boosts are stored as part of the norm, which is only one byte. This reduces the resolution of the field length normalization factor which can lead to lower quality relevance calculations.


 
title-search-when-relevancy-is-only-skin-deep/里面说:norms 倾向于给 短文本document 打高分,但不是太理解,希望有人解答一下?


This behavior is related to what are known in Lucene-based search engines as “norms”. Norms bias search results to shorter pieces of text


 看到:https://lucene.472066.n3.nabbl ... .html 中的描述之后,Lucene评分模型倾向于给短文本打高分,是不是就是因为: 开启 norms 参数 导致的?


Length normalization of the field.  Full-text matches on shorter
fields score higher because the match is seen as more specific.  You
loose that if you omit norms.  That's typically OK for short fields
like "title" anyway, and fields that aren't full-text (like dates,
numbers, etc).

要回复问题请先登录注册