Overview
整理一下es的snapshot功能,分两块,一块是本地磁盘disk存储,一块是远程hdfs作存储,
Version
- elasticsearch-5.4.3.zip
- repository-hdfs-5.4.3.zip
Install plugin
# need to specified absolute path
bin/elasticsearch-plugin install file:///data/mapleleaf/es_snapshot/repository-hdfs-5.4.3.zip
# check hdfs master namenode ip and port using webhdfs
curl -i "http://localhost:8081/webhdfs/v1/?op=LISTSTATUS"
# start es
sh bin/elasticsearch -d
ps aux | grep elasticsearch | grep -v "grep" | awk '{print $2}' | xargs kill -9
ps aux | grep elasticsearch | grep -v "grep" | awk '{print $2}' | xargs kill -9 ; sleep 3 && sh bin/elasticsearch -d && ps aux | grep elasticsearch | grep -v "grep" && tailf logs/es_snap.log
Disk
create repo
# add below line to esyml
path.repo: ["/data/mapleleaf/es_snapshot/my_backup"]
# create repo, named: my_backup
curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -H 'Content-Type: application/json' -d '{
"type": "fs",
"settings": {
"location": "/data/mapleleaf/es_snapshot/my_backup",
"compress": true
}
}'
curl -X GET "localhost:9200/_snapshot/my_backup?pretty"
curl -X DELETE "localhost:9200/_snapshot/my_backup"
create snapshot
# create snapshot
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true&pretty"
curl -X GET "localhost:9200/_snapshot/my_backup/*?pretty"
curl -X GET "localhost:9200/_snapshot/my_backup/snapshot_1/_status?pretty"
curl -X DELETE "localhost:9200/_snapshot/my_backup/snapshot_1?pretty"
restore
# restore
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty"
setp
- check index ``` curl -X PUT “localhost:9200/customer” -H ‘Content-Type: application/json’ -d’ { “settings” : { “index” : { “number_of_shards” : 5, “number_of_replicas” : 0 } } } ‘
curl -X GET “localhost:9200/_cat/indices?v” curl -X DELETE “localhost:9200/customer?pretty”
2. insert data
for i in {1..10000}; do curl -s -X POST “localhost:9200/customer/external/?pretty” -H ‘Content-Type: application/json’ -d” { "id": ${i}, "num": ${i}, "name": "John Doe" }” > /dev/null done
data:image/s3,"s3://crabby-images/bf629/bf6293e361e6fc44dff01fc3fe770b10f7a4b008" alt="image"
> insert docs
3. close index
curl -X POST “localhost:9200/customer/_close?pretty”
4. restore
因为之前我store了一次backup,当时backup只有1条doc,当插入1万条之后,close,然后restore,是以当时store的snapshot来恢复。
data:image/s3,"s3://crabby-images/02a7f/02a7fa240e8ad2ac7988b0ddb40c30f0af532d98" alt="image"
> after restore
5. reinsert
curl -X GET “localhost:9200/_search?pretty” -H ‘Content-Type: application/json’ -d’ { “query”: { “match_all”: {} } }’
data:image/s3,"s3://crabby-images/aedf9/aedf91a3386967a54fb67a48b6ba9d2926710fa6" alt="image"
> reinsert
6. create snapshot_2
data:image/s3,"s3://crabby-images/59620/596200a809d2c04e3b82ee0591ba61512a561bce" alt="image"
> before
data:image/s3,"s3://crabby-images/1bc2b/1bc2b69582fe3bd563c4ff8bb066e43febe8aa44" alt="image"
> after
7 close & restore
----
# [HDFS](https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/repository-hdfs-config.html)
## create hdfs repo
curl -X PUT “localhost:9200/_snapshot/my_hdfs_repository?pretty” -H ‘Content-Type: application/json’ -d’ { “type”: “hdfs”, “settings”: { “uri”: “hdfs://???:???”, “path”: “elasticsearch/respositories/my_hdfs_repository”, “compress”: true } }’
如果在这一步出现异常,可以参考[这里](https://github.com/elastic/elasticsearch/issues/22156)。
data:image/s3,"s3://crabby-images/ba269/ba269b5c450e70e545f4c7ed02ef4d0672cb7b6e" alt="image"
> create repo successed
## insert data
data:image/s3,"s3://crabby-images/4e9bc/4e9bcdc734369c01e2d0312b25b51244d8696a32" alt="image"
> doc 10000
## create hdfs snapshot
curl -X PUT “localhost:9200/_snapshot/my_hdfs_repository/snapshot_hdfs_1?wait_for_completion=true&pretty”
data:image/s3,"s3://crabby-images/cd89a/cd89a60346765e038fad4019a0427bf87a7c7287" alt="image"
> access_control_exception
在`jvm.optiopns`添加插件的安全配置
data:image/s3,"s3://crabby-images/e458f/e458fb267a93a9780cbee6a541089ccac62a505c" alt="image"
> fix access_control_exception
data:image/s3,"s3://crabby-images/9a6de/9a6debcb64a1f2bb79f75358cbe9efbc30d9ad38" alt="image"
> create snap successed
data:image/s3,"s3://crabby-images/af665/af6659f19854f2e3314e03609fe357c94e41ff83" alt="image"
> hdfs ls snapshot files
## restore from hdfs
1. 随意增加一些docs,使得与snapshot时的index有差异,便于观察restore效果。
data:image/s3,"s3://crabby-images/52d0c/52d0c2cf200c2bc60d435791cfc7ee6da0ffd961" alt="image"
> doc 10000+
2. close index
data:image/s3,"s3://crabby-images/2dc9b/2dc9beaf3cd1ba473abe8a3dd7a4e99d24bd0039" alt="image"
> doc index close
3. restore
curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/snapshot_hdfs_1/_restore?pretty"
data:image/s3,"s3://crabby-images/01a20/01a20fbd94d0d7bf55728577d5bfb94ac958f668" alt="image"
> restore successed
data:image/s3,"s3://crabby-images/74976/74976af7562516785cb9bc34105b405e40eeb331" alt="image"
> doc 10000
----
# Restoring to a different cluster
> All that is required is `registering` the repository containing the snapshot in the new cluster and `starting` the restore process.
curl -X GET “localhost:9201/_cat/indices?v”
data:image/s3,"s3://crabby-images/1f28f/1f28f0c0864ae6beb5bb9eb5b0f939a139ef05cd" alt="image"
> clusterB initial
## registering repository
curl -X PUT “localhost:9201/_snapshot/my_hdfs_repository?pretty” -H ‘Content-Type: application/json’ -d’ { “type”: “hdfs”, “settings”: { “uri”: “hdfs://???:???”, “path”: “elasticsearch/respositories/my_hdfs_repository”, “compress”: true } }’
data:image/s3,"s3://crabby-images/02f38/02f38acc8196d6fcc185a327a797e8831a048a86" alt="image"
> registering using the same hdfs path with clusterA
## list snapshot
curl -X GET “localhost:9201/_snapshot/my_hdfs_repository/*?pretty”
data:image/s3,"s3://crabby-images/567a1/567a1fdd458526c7493b6de154071b88f9167c5d" alt="image"
> lists working snapshots
## starting restore
curl -X POST “localhost:9201/_snapshot/my_hdfs_repository/snapshot_hdfs_1/_restore?pretty”
data:image/s3,"s3://crabby-images/e65aa/e65aa61479af254b81aa143117a0e1a3489f3de0" alt="image"
> restore successed
----
# benchmark
会用esrally将数据写入
data:image/s3,"s3://crabby-images/e0895/e0895bf6e9956a064ef7974c1cd406d743b4ba4e" alt="image"
> before
## snapshoting speed
data:image/s3,"s3://crabby-images/1dad5/1dad5c63810bfc87284b9e8d2cfdbb5e2070f9dc" alt="image"
> hdfs before snapshot
backgroud running
curl -X PUT “XXX:9200/_snapshot/my_hdfs_repository/snapshot_hdfs_long_1” -H ‘Content-Type: application/json’ -d’ { “indices”: “591_etl_fuhaochen_test_2018062500”, “ignore_unavailable”: true, “include_global_state”: false }’
check running status
curl -X GET “XXX:9200/_snapshot/my_hdfs_repository/*?pretty”
data:image/s3,"s3://crabby-images/d2086/d20862b0b912e12ad022340315ae81b32d9eca6c" alt="image"
> in_progress
data:image/s3,"s3://crabby-images/957ad/957ada6023d5ed6d5e33e65a625c74742428064f" alt="image"
> success
data:image/s3,"s3://crabby-images/f7746/f77468ec6f1e166a57943ae666d3129368eba092" alt="image"
> hdfs after snapshot
## restoring speed
date curl -X POST “XXX:9201/_snapshot/my_hdfs_repository/snapshot_hdfs_long_1/_restore?wait_for_completion=true&pretty” date ```
after
snapshoting耗时远比restoring高。
plugin auto route
测试一下插件会不会自动路由,即是否需要在每一个节点(datanode,masternode等)都安装?还是只需要在整个es集群的其中一个node安装之后,该node就会将plugin自动路由安装到集群的其他node上?
health
nodes
plugins
自动路由不可用。
other
- 尝试snapshot更大的index,但是报错了,配置应该没有问题(因为小索引是snapshot成功的)
大索引snapshot失败
小索引snapshot成功
Self-suppression not permitted这个error应该是hadoop的DataNode剩余空间不够导致。