path.data 是可以配置多个路径(用逗号隔开),以下的问题的前提是配置的路径分别对应到一块磁盘。
1、如果配置多个路径,es存储索引的时候 是一个磁盘满了 再去存另外一个吗?
2、如果一个索引跨了两个磁盘,搜索的时候,es需要去两块磁盘上分别做搜索,然后结果做merge吗? 这样效率是不是还没有一块磁盘来的好?
1、如果配置多个路径,es存储索引的时候 是一个磁盘满了 再去存另外一个吗?
2、如果一个索引跨了两个磁盘,搜索的时候,es需要去两块磁盘上分别做搜索,然后结果做merge吗? 这样效率是不是还没有一块磁盘来的好?
4 个回复
rojay - 杭州的一枚90后初入职场的IT男
赞同来自: medcl 、byx313 、cccthought
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.data:/index/data,/data2/index/data
配置了两块盘,对应了两个路径。那么我现在要创建hrecord1索引的2个主shard分配原理如下:
首先会创建shard1(我估计ES会优先创建shard编号大的shard,但是影响不大),创建shard1的时候会找出两个路径对应的磁盘空间大的那个盘,然后将shard1放到那个路径下。
创建shard0的时候,会将/index和/data2磁盘的剩余可用空间相加,然后将这个总和乘以百分之五
将前面创建shard1的磁盘空间减去这个百分之五的值,然后再将这个差值与/data2磁盘剩余空间进行比较,找出磁盘空间大的,然后把shard0放到那个大的磁盘空间上。
说白了,这个百分之五的空间是ES为那个创建的shard1设置的预留空间吧。
有错误的地方也欢迎大家指出,一起交流哈!
主要代码在ShardPath.java里面
medcl - 今晚打老虎。
赞同来自: Rassyan 、cccthought 、liujiacheng
2.查询的时候是shard并行检索的,所以磁盘分开的话,io会分散到各个磁盘上,效率肯定要高一些,查询结果的merge都是在内存
owen
赞同来自:
Using multiple IO devices (by specifying multiple path.data paths) to hold the shards on your node is useful for increasing total storage space, and improving IO performance, if that's a bottleneck for your Elasticsearch usage.
With 2.0, there is an important change in how Elasticsearch spreads the IO load across multiple paths: previously, each low-level index file was sent to the best (default: most empty) path, but now that switch is per-shard instead. When a shard is allocated to the node, the node will pick which path will hold all files for that shard.
The improves resiliency to IO device failures: if one of your IO devices crashes, now you'll only lose the shards that were on it, whereas before 2.0 you would lose any shard that had at least one file on the affected device (typically this means nearly all shards on that node).
Note that an OS-level RAID 0 device is also a poor choice as you'll lose all shards on that node when any device fails, since files are striped at the block level, so multiple path.data is the recommended approach.
You should still have at least 1 replica for your indices so you can recover the lost shards from other nodes without any data loss.
likui1314159 - 80后IT一枚
赞同来自: