• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

HDPCA Exam Objective – Recover a snapshot

by admin

Note: This is post is part of the HDPCA exam objective series

We mentioned earlier that HDFS replication alone is not a suitable backup strategy. In the Hadoop 2 filesystem, snapshots have been added, which brings another level of data protection to HDFS. As changes to the filesystem are made, any change that would affect the snapshot is treated specially. For example, if a file that exists in the snapshot is deleted then, even though it will be removed from the current state of the filesystem, its metadata will remain in the snapshot, and the blocks associated with its data will remain on the filesystem though not accessible through any view of the system other than the snapshot.

We can recover a snapshot in HDFS to rollback to the desired system state in case of a data loss or corruption. As a part of the exam objective, we will create a snapshot and try to perform a recovery of the snapshot in this post.

1. Create a snapshot

1. Let’s first cerate a snapshot on a snapshottable directory. If the directory is not snapshottable, you can allow snapshot using the command:

$ hdfs dfsadmin -allowSnapshot /user/test
Allowing snapshot on test succeeded

2. Create a snapshot of the directory “/user/test” with snapshot_latest as the name of the snapshot.

$ hdfs dfsadmin -createSnapshot /user/test snapshot_latest
Created snapshot /user/test/.snapshot/snapshot_latest

3. View the snapshot in the .snapshot directory.

$ hdfs dfs -ls /user/test/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2018-07-21 10:16 /user/test/.snapshot/snapshot_latest

2. Delete a file

Now, delete any file from the /user/test directory in HDFS.

$ hdfs dfs -ls /user/test
Found 2 items
-rw-r--r--   3 hdfs hdfs         27 2018-07-21 10:34 /user/test/another_test
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/test_file
$ hdfs dfs -rm /user/test/test_file
18/07/21 11:06:40 INFO fs.TrashPolicyDefault: Moved: 'hdfs://geeklab/user/test/test_file' to trash at: hdfs://geeklab/user/hdfs/.Trash/Current/user/test/test_file
Note the mention of trash directories; by default, HDFS will copy any deleted files into a .Trash directory in the user’s home directory, which helps to defend against slipping fingers. These files can be removed through “hdfs dfs -expunge” or will be automatically purged in 7 days by default.

Verif that the file is not present.

$ hdfs dfs -ls /user/test/test_file
ls: `/user/test/test_file': No such file or directory

3. Recover the snapshot

1. You can restore the delete file from the /user/test/.snapshot directory which still has the copy of the test_file present.

$ hdfs dfs -ls /user/test/.snapshot/snapshot_latest
Found 1 items
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 10:10 /user/test/.snapshot/snapshot_latest/test_file
$ hdfs dfs -cat /user/test/.snapshot/snapshot_latest/test_file
This is a test file.

2. Lets copy the removed file from snapshot directory to the original location of the file.

$ hdfs dfs -cp /user/test/.snapshot/snapshot_latest/test_file /user/test/

Verify:

$ hdfs dfs -ls /user/test/test_file
-rw-r--r--   3 hdfs hdfs         21 2018-07-21 11:22 /user/test/test_file
HDPCA Exam Objective – Create a snapshot of an HDFS directory

Filed Under: Hadoop, HDPCA, Hortonworks HDP

Some more articles you might also be interested in …

  1. CCA131 – Create an HDFS user’s home directory
  2. HDPCA Exam Objective – Configure HiveServer2 HA ( Part 1 – Installing HiveServer )
  3. CCA 131 – Perform OS-level configuration for Hadoop installation
  4. HDPCA Exam Objective – Create a snapshot of an HDFS directory
  5. HDPCA Exam Objective – Decommission a node (NodeManager)
  6. CCA 131 – Add a service using Cloudera Manager
  7. HDPCA Exam Objective – Configure NameNode HA
  8. HDPCA Exam Objective – Install and configure Knox
  9. How to run Hadoop without using SSH
  10. Converting Many Small Files To A Sequence File In HDFS

You May Also Like

Primary Sidebar

Recent Posts

  • cf: Command-line tool to manage apps and services on Cloud Foundry
  • certutil: Manage keys and certificates in both NSS databases and other NSS tokens
  • cdk: A CLI for AWS Cloud Development Kit (CDK)
  • cd: Change the current working directory

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright