• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • VCS
  • Interview Questions
  • Database
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

CCA 131 – Create/restore a snapshot of an HDFS directory (Using Cloudera Manager)

by admin

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series

HDFS Snapshot

Directories in HDFS can be snapshotted, which means creating one or more point-in-time images, or snapshots, of the directory. Snapshots include subdirectories, and can even include the entire filesystem (be careful with this for obvious reasons). Snapshots can be used as backups or for auditing purposes.

As changes to the filesystem are made, any change that would affect the snapshot is treated specially. For example, if a file that exists in the snapshot is deleted then, even though it will be removed from the current state of the filesystem, its metadata will remain in the snapshot, and the blocks associated with its data will remain on the filesystem though not accessible through any view of the system other than the snapshot.

Enabling HDFS snapshot from Cloudera Manager

In order to use the HDFS snapshot, it must be enabled first. To enable the snapshot feature from Cloudera Manager goto, Home > HDFS > File Browser. Select the directory for which you want to enable snapshot. The snapshot can be enabled from 2 places on this page as shown in the screenshot below.

enabling snapshot from Cloudera Manager CCA 131

Proceed with the directory selected for enabling snapshot.

select path to enable snapshot from Cloudera Manager CCA 131

Wait for the Cloudera Manager to confirm the successful execution of the command for enabling snapshot.

taking snapshot in HDFS CCA131 exam objective

Taking snapshot from Cloudera Manager

Once you have enabled the snapshot for the directory “/user/test”, you can take your first snapshot. Select the option “take snapshot” from the drop-down as shown in the screenshot below.

taking HDFS snapshot from Cloudera Manager CCA 131

On the next screen, provide the snapshot name (test_snap).

provide snapshot name - CCA 131 HDFS snapshot

You should see a command completion message as shown in the following screenshot:

create snapshot command completion CCA 131 exam

You can verify the newly created snapshot on the file browser page as shown in the screenshot below.

view HDFS snapshots in Cloudera Manager

Restoring Snapshots Using Cloudera Manager

Let’s see how we can restore a snapshot from Cloudera Manager to retrieve an earlier state of HDFS directory. We have a snapshot named “test_snap” created on the directory “/user/test”. We will delete the file “/user/test/test_file” and try restoring it back from the snapshot. Follow the steps outlined below:

1. We will first delete the file “/user/test/test_file” using the commandline.

# su - hdfs 
$ hdfs dfs -rm /user/test/test_file
18/09/02 21:02:33 INFO fs.TrashPolicyDefault: Moved: 'hdfs://master.localdomain:8020/user/test/test_file' to trash at: hdfs://master.localdomain:8020/user/hdfs/.Trash/Current/user/test/test_file

2. To restore the snapshot goto Cloudera Manager > HDFS > File Browser. Select the directory “/user/test” in the File Browser and select the option “Restore Directory from Snapshot” from the dropdown as shown below.

restoring HDFS snapshots CCA 131 exam

3. If you have multiple snapshots, you can select the desired snapshot for restoring from the drop-down. In my case, I have only one snapshot to restore i.e. “test_snap”. I will go ahead and restore it. Here we have 2 restore methods:
1. Use the HDFS “copy” command – Uses the regular “hdfs dfs -cp” command to copy the files.
2. Use DistCp / MapReduce – Uses the “DistCp” command which in turn uses the MapReduce in the backend to copy the files from snapshot. DistCp is parallel and very fast as compared to normal HDFS copy command.

select snapshot to restore - CCA 131 restore HDFS snapshot

4. The next screen will show the steps taken to restore the directory “/user/test”.

restore snapshot execution from Cloudera Manager CCA 131

5. On completion, you can verify the file “/user/test/test_file” file being available again.

verify restoring HDFS snapshot from Cloudera Manager CCA 131

$ hdfs dfs -ls /user/test
Found 1 items
-rw-r--r--   3 hdfs supergroup          0 2018-09-02 21:06 /user/test/test_file

Restoring Snapshot to a different location

You can also restore the snapshot to a different location than the original. Follow the steps outlined below to restore the snapshot “test_snap” of the directory “/user/test” to a different location (/user/snapshot_restore)

1. Goto Cloudera Manager > HDFS > File Browser. Select the directory to restore and select the option “Restore Directory From Snapshot As“.

restore directory from snapshot as a different directory CCA 131 HDFS snapshot

2. Provide the directory location to restore the snapshot. Also, select the snapshot to restore and the restore method.

Note: If the directory entered above exists, that directory will be overwritten.

restore HDFS directory to a different location CCA 131

3. You should see a command completion message as shown in the following screenshot:

successful restoration of HDFS snapshot CCA 131 exam

4. Verify the restoration of the snapshot from the Cloudera Manager.

verify the restoration of HDFS snapshot CCA 131

$ hdfs dfs -ls /user/snapshot_restore
Found 1 items
-rw-r--r--   3 hdfs supergroup          0 2018-09-02 22:10 /user/snapshot_restore/test_file

Delete and Disable HDFS snapshot

Delete Snapshot from Cloudera Manager

1. You can delete the snapshot if you no longer need it as shown in the screenshot below.

delete HDFS snapshot from Cloudera Manager CCA 131

2. You should see a command completion message as shown in the following screenshot:

command completion message - CCA131 delete HDFS snapshot

Disable Snapshot from Cloudera Manager

1. Disabling the snapshot on a directory, will not allow any user to create a snapshot on it.

disabling HDFS snapshots from Cloudera Manager CCA 131

Confirm the disabling the snapshot.

confirm disabling HDFS snapshots CCA 131

2. You should see a command completion message as shown in the following screenshot:

command completion message - CCA 131 disable HDFS snapshots from Cloudera Manager

Creating and restoring HDFS snapshots using command line

You can perform the snapshot creation and restoration tasks defined above using the command line interface as well. To get more details on command line usage, refer the post below. The examples in the below post are performed on a Hortonworks HDP platform, but the command line usage for snapshot creation and restoration is same in all the commercial Hadoop distributions.

HDPCA Exam Objective – Create a snapshot of an HDFS directory
CCA131 – Configuring HDFS snapshot Policy

Filed Under: CCA 131, Cloudera, Hadoop

Some more articles you might also be interested in …

  1. Converting Many Small Files To A Sequence File In HDFS
  2. HDPCA Exam Objective – Add a new node to an existing cluster
  3. Preparing for the HDPCA (HDP Certified Administrator) Exam
  4. CCA131 – Configuring HDFS snapshot Policy
  5. How to configure Capacity Scheduler Queues Using YARN Queue Manager
  6. CCA 131 – Rebalance the cluster
  7. Preparing for the CCA Administrator Exam (CCA131)
  8. HDPCA Exam Objective – Install ambari-server
  9. CCA 131 – Install Cloudera Manager server and agents
  10. CCA 131 – Add a new node to an existing cluster

You May Also Like

Primary Sidebar

Recent Posts

  • Vanilla OS 2 Released: A New Era for Linux Enthusiasts
  • mk Command Examples
  • mixxx Command Examples
  • mix Command Examples

© 2025 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright