HDPCA Exam Objective – Recover a snapshot

Note: This is post is part of the HDPCA exam objective series

We mentioned earlier that HDFS replication alone is not a suitable backup strategy. In the Hadoop 2 filesystem, snapshots have been added, which brings another level of data protection to HDFS. As changes to the filesystem are made, any change that would affect the snapshot is treated specially. For example, if a file that exists in the snapshot is deleted then, even though it will be removed from the current state of the filesystem, its metadata will remain in the snapshot, and the blocks associated with its data will remain on the filesystem though not accessible through any view of the system other than the snapshot.

We can recover a snapshot in HDFS to rollback to the desired system state in case of a data loss or corruption. As a part of the exam objective, we will create a snapshot and try to perform a recovery of the snapshot in this post.

1. Create a snapshot

1. Let’s first cerate a snapshot on a snapshottable directory. If the directory is not snapshottable, you can allow snapshot using the command:

$ hdfs dfsadmin -allowSnapshot /user/test
Allowing snapshot on test succeeded

2. Create a snapshot of the directory “/user/test” with snapshot_latest as the name of the snapshot.

$ hdfs dfsadmin -createSnapshot /user/test snapshot_latest
Created snapshot /user/test/.snapshot/snapshot_latest

3. View the snapshot in the .snapshot directory.

$ hdfs dfs -ls /user/test/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-21 10:16 /user/test/.snapshot/snapshot_latest

2. Delete a file

Now, delete any file from the /user/test directory in HDFS.

$ hdfs dfs -ls /user/test
Found 2 items
-rw-r--r--   3 hdfs hdfs         27 2018-07-21 10:34 /user/test/another_test
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/test_file
$ hdfs dfs -rm /user/test/test_file
18/07/21 11:06:40 INFO fs.TrashPolicyDefault: Moved: 'hdfs://geeklab/user/test/test_file' to trash at: hdfs://geeklab/user/hdfs/.Trash/Current/user/test/test_file
Note the mention of trash directories; by default, HDFS will copy any deleted files into a .Trash directory in the user’s home directory, which helps to defend against slipping fingers. These files can be removed through “hdfs dfs -expunge” or will be automatically purged in 7 days by default.

Verif that the file is not present.

$ hdfs dfs -ls /user/test/test_file
ls: `/user/test/test_file': No such file or directory

3. Recover the snapshot

1. You can restore the delete file from the /user/test/.snapshot directory which still has the copy of the test_file present.

$ hdfs dfs -ls /user/test/.snapshot/snapshot_latest
Found 1 items
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/.snapshot/snapshot_latest/test_file
$ hdfs dfs -cat /user/test/.snapshot/snapshot_latest/test_file
This is a test file.

2. Lets copy the removed file from snapshot directory to the original location of the file.

$ hdfs dfs -cp /user/test/.snapshot/snapshot_latest/test_file /user/test/

Verify:

$ hdfs dfs -ls /user/test/test_file
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 11:22 /user/test/test_file
Related Post