橡皮、老虎皮、狮子皮哪一个最不好?

path.data 配置了多个路径后 es的存储和获取机制是什么

Elasticsearch | 作者 owen | 发布于2016年08月24日 | 阅读数:18022

path.data 是可以配置多个路径(用逗号隔开),以下的问题的前提是配置的路径分别对应到一块磁盘。
1、如果配置多个路径,es存储索引的时候 是一个磁盘满了 再去存另外一个吗?
2、如果一个索引跨了两个磁盘,搜索的时候,es需要去两块磁盘上分别做搜索,然后结果做merge吗? 这样效率是不是还没有一块磁盘来的好?
已邀请:

rojay - 杭州的一枚90后初入职场的IT男

赞同来自: medcl byx313 cccthought

最近也遇到第一个问题。查找网上所有资料均未给出合适的答案,无奈只好硬着头皮去看源码。好在终于把这个原理理清楚了,来跟大家一起分享一下。
 
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.data:/index/data,/data2/index/data
配置了两块盘,对应了两个路径。那么我现在要创建hrecord1索引的2个主shard分配原理如下:
首先会创建shard1(我估计ES会优先创建shard编号大的shard,但是影响不大),创建shard1的时候会找出两个路径对应的磁盘空间大的那个盘,然后将shard1放到那个路径下。
创建shard0的时候,会将/index和/data2磁盘的剩余可用空间相加,然后将这个总和乘以百分之五
将前面创建shard1的磁盘空间减去这个百分之五的值,然后再将这个差值与/data2磁盘剩余空间进行比较,找出磁盘空间大的,然后把shard0放到那个大的磁盘空间上。
说白了,这个百分之五的空间是ES为那个创建的shard1设置的预留空间吧。
有错误的地方也欢迎大家指出,一起交流哈!
 主要代码在ShardPath.java里面
public static ShardPath selectNewPathForShard(NodeEnvironment env, ShardId shardId, IndexSettings indexSettings,
long avgShardSizeInBytes, Map<Path,Integer> dataPathToShardCount) throws IOException {

final Path dataPath;
final Path statePath;

if (indexSettings.hasCustomDataPath()) {
dataPath = env.resolveCustomLocation(indexSettings, shardId);
statePath = env.nodePaths()[0].resolve(shardId);
} else {
BigInteger totFreeSpace = BigInteger.ZERO;
for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {
totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));
}

// TODO: this is a hack!! We should instead keep track of incoming (relocated) shards since we know
// how large they will be once they're done copying, instead of a silly guess for such cases:

// Very rough heuristic of how much dtisk space we expec the shard will use over its lifetime, the max of current average
// shard size across the cluster and 5% of the total available free space on this node:
BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));

// TODO - do we need something more extensible? Yet, this does the job for now...
final NodeEnvironment.NodePath[] paths = env.nodePaths();
NodeEnvironment.NodePath bestPath = null;
BigInteger maxUsableBytes = BigInteger.valueOf(Long.MIN_VALUE);
for (NodeEnvironment.NodePath nodePath : paths) {
FileStore fileStore = nodePath.fileStore;

BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());
assert usableBytes.compareTo(BigInteger.ZERO) >= 0;

// Deduct estimated reserved bytes from usable space:
Integer count = dataPathToShardCount.get(nodePath.path);
if (count != null) {
usableBytes = usableBytes.subtract(estShardSizeInBytes.multiply(BigInteger.valueOf(count)));
}
if (bestPath == null || usableBytes.compareTo(maxUsableBytes) > 0) {
maxUsableBytes = usableBytes;
bestPath = nodePath;
}
}

statePath = bestPath.resolve(shardId);
dataPath = statePath;
}
return new ShardPath(indexSettings.hasCustomDataPath(), dataPath, statePath, shardId);
}

medcl - 今晚打老虎。

赞同来自: Rassyan cccthought liujiacheng

1.以shard为单位存储在单个磁盘上,多个shard会分散,不是等满了再用
2.查询的时候是shard并行检索的,所以磁盘分开的话,io会分散到各个磁盘上,效率肯定要高一些,查询结果的merge都是在内存

owen

赞同来自:

Multiple path.data

Using multiple IO devices (by specifying multiple path.data paths) to hold the shards on your node is useful for increasing total storage space, and improving IO performance, if that's a bottleneck for your Elasticsearch usage.

With 2.0, there is an important change in how Elasticsearch spreads the IO load across multiple paths: previously, each low-level index file was sent to the best (default: most empty) path, but now that switch is per-shard instead.  When a shard is allocated to the node, the node will pick which path will hold all files for that shard. 

The improves resiliency to IO device failures: if one of your IO devices crashes, now you'll only lose the shards that were on it, whereas before 2.0 you would lose any shard that had at least one file on the affected device (typically this means nearly all shards on that node).

Note that an OS-level RAID 0 device is also a poor choice as you'll lose all shards on that node when any device fails, since files are striped at the block level, so multiple path.data is the recommended approach.


You should still have at least 1 replica for your indices so you can recover the lost shards from other nodes without any data loss.

likui1314159 - 80后IT一枚

赞同来自:

记得之前的版本好像是,取一个容量最大的先存,如果设置的存储路径的控件超过85%就不会继续写了

要回复问题请先登录注册