是时候用 ES 拯救发际线啦

mapping中index_options和term_vector的区别

Elasticsearch | 作者 jingkyks | 发布于2015年04月17日 | 阅读数:11299

两个设置都存在关注位置信息是否索引的信息,区别在哪里?term_vector是lucene层面的索引设置,而index_option貌似是es的设置,二者的设置是否会相互覆盖?在高亮的时候,index_option的设置来决定Postings highlighter,而term_vector的设置则决定fvh。
已邀请:

jingkyks - 水果铅笔2B橡皮

赞同来自: medcl

今天在discuss.elast.co上看到了一个解释,分享如下:
 
first of all index_options & term_vectors are two totally different things. 
index_options are "options" for the index you are searching on, a 
datastructure that holds "terms" to document lists (posting lists). 
TermVectors are a datastructure that gives you the "terms" for a given 
document and in addition their position in the document as well as their 
start and end character offsets. Now the index (each field has such an 
index) holds a sorted list of terms and each term points to a posting list. 
these posting lists are a list of documents that contain the term. On the 
posting list you can also store information like frequencies (how often did 
term Y occur in document X -> useful for scoring) as well as "positions" 
(at which position did term Y occur in document X -> this is required fo 
phrase & span queries). 

if you have for instance a field that you only use for filtering you don't 
need freqs and postions so documents only will do the job. In an index the 
position information is the biggest piece of data usually aside stored 
fields. If you don't do phrase queries or spans you don't need them at all 
so safe the disk space and improve perf by only use docs and freqs. In 
previous version it wasn't possible to have only freqs but no positions 
(index_options supersede omit_term_frequencies_and_positions) so this is an 
improvement overall since the most common usecase might only need freqs but 
no positions. 

jingkyks - 水果铅笔2B橡皮

赞同来自:

附上一些选项:
1:term_vector
TermVector.YES: Only store number of occurrences.
TermVector.WITH_POSITIONS: Store number of occurrence and positions of terms, but no offset.
TermVector.WITH_OFFSETS: Store number of occurrence and offsets of terms, but no positions.
TermVector.WITH_POSITIONS_OFFSETS:number of occurrence and positions , offsets of terms.
TermVector.NO:Don't store any term vector information.
2: index_options
Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).

要回复问题请先登录注册