HDPCA Exam Objective – Create a snapshot of an HDFS directory

Note: This is post is part of the HDPCA exam objective series

HDFS Sanpshot

In spite of having a replication factor of 3, there are chances of data loss in the Hadoop cluster due to human error or corruptions. Hadoop 2.0 added the capability of taking a snapshot (read-only copy and copy-on-write) of the filesystem (data blocks) stored on the data nodes. Using Snapshots, you can take a copy of directories seamlessly using the NameNode’s metadata of the data blocks. Snapshot creation is instantaneous and doesn’t require interference with other regular HDFS operations.

As a part of the exam objective, we will create a snapshot of the HDFS directory in this post.

snapshottable path

Snapshots in Hadoop 2 can be applied at either the full filesystem level or only on particular paths. A path needs to be set as snapshottable, and note that you cannot have a path snapshottable if any of its children or parent paths are themselves snapshottable.

Use the dfsadmin subcommand of the hdfs CLI utility to enable snapshots of a directory, as follows:

$ hdfs dfsadmin -allowSnapshot /user/test/testdir
Allowing snapshot on testdir succeeded

To view all the Snapshottable directories in the HDFS, use the command:

$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2018-07-21 10:16 1 65536 /user/test/testdir

Creating snapshot of HDFS directory

1. Connect to the namenode nn1 and switch to user hdfs.

# ssh root@nn1
# su - hdfs

2. Create a new test directory in HDFS and upload a sample text file from the local filesystem.

$ hdfs dfs -mkdir /user/test

Sample text file on local filesystem:

$ cat /home/hdfs/test_file
This is a test file.

Copy file from local to HDFS:

$ hdfs dfs -put /home/hdfs/test_file /user/test/

Verify the file existance in HDFS:

$ hdfs dfs -ls /user/test
Found 1 items
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/test_file

3. We have to first check if the /user/test directory is snapshottable or not.

$ hdfs dfs -createSnapshot /user/test snapshot_on_21_july
createSnapshot: Directory is not a snapshottable directory: /user/test

Lets make the /user/test directory snapshottable:

$ hdfs dfsadmin -allowSnapshot /user/test
Allowing snaphot on /user/test succeeded

Create the snapshot again:

$ hdfs dfs -createSnapshot /user/test snapshot_on_21_july
Created snapshot /user/test/.snapshot/snapshot_on_21_july

The process of creating a snapshot is instantaneous as blocks themselves are not copied.

4. The snapshots themselves are stored in a .snapshot directory under the snapshotted directory.

$ hdfs dfs -ls /user/test/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-21 10:16 /user/test/.snapshot/snapshot_on_21_july
$ hdfs dfs -ls /user/test/.snapshot/snapshot_on_21_july
Found 1 items
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/.snapshot/snapshot_on_21_july/test_file

Comparing snapshots

Snapshots can be compared against one another to track changes between snapshots. This is done using the command “hdfs dfs snapshotDiff”.

1. Lets add another file into the same HDFS directory first from the local filesystem.

$ cat /home/hdfs/another_test
This is another test file.
$ hdfs dfs -put /home/hdfs/another_test /user/test/
$ hdfs dfs -ls /user/test
Found 2 items
-rw-r--r--   3 hdfs hdfs         27 2018-07-21 10:34 /user/test/another_test
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/test_file

2. Create new snapshot on the same directory.

$ hdfs dfs -createSnapshot /user/test snapshot_latest
Created snapshot /user/test/.snapshot/snapshot_latest

3. Compare the 2 snapshots we created:

$ hdfs snapshotDiff /user/test snapshot_on_21_july snapshot_latest
Difference between snapshot snapshot_on_21_july and snapshot snapshot_latest under directory /user/test:
M .
+ ./another_test

As you can see from the output above, we see the new addition of file “another_test” in the snapshot_latest.

Disable snapshot on a directory

Similar to allowing the snapshot on a directory in HDFS, we can disable the snapshot. To disable the snapshot on /user/user01 directory :

$ hdfs dfsadmin -disallowSnapshot /user/user01
Disallowing snaphot on /user/user01 succeeded

Verify the current Snapshotable directories:

$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2018-07-21 10:36 2 65536 /user/test

Delete a snapshot

There can be multiple snapshots for a directory and they can also be deleted with the following command:

$ hdfs dfs -deleteSnapshot /user/test snapshot_latest

Rename a snapshot

We can also rename a snapshot using the command “hdfs dfs -renameSnapshot”.

$ hdfs dfs -renameSnapshot /test old_name new_name

For example:

$ hdfs dfs -renameSnapshot /user/test snapshot_on_21_july snapshot_latest

Verify the new name of the snapshot:

$ hdfs dfs -ls /user/test/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-21 10:16 /user/test/.snapshot/snapshot_latest
Related Post