The start-all.sh and stop-all.sh scripts in the hadoop/bin directory will use SSH to launch some of the Hadoop daemons. If for some reason SSH is not available on the server, please follow the steps below to run Hadoop without using SSH. The goal is to modify all "hadoop-daemons.sh" with "hadoop-daemon.sh". The "hadoop-daemons.sh" simply runs "hadoop-daemon.sh" through SSH. 1. Modify start-dfs.sh script: from: ${bin}/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode … [Read more...] about How to run Hadoop without using SSH
Hadoop
How To Modify Hadoop Log Level
By default, Hadoop's log level is set to INFO. This can be too much for most instances, as it will generate huge log files, even in an environment with low to moderate traffic. Changing the root logger in log4j.properties file in Hadoop will not change the log level. Follow the steps below for changing the log level of Hadoop. 1. Shut down Hadoop if it is still running. 2. Open the [hadoop_home]/bin/hadoop-daemon.sh file. Look for the following line: export … [Read more...] about How To Modify Hadoop Log Level
Understanding the Hadoop MapReduce framework
This post introduces the MapReduce framework that enables you to write applications that process vast amounts of data, in parallel, on large clusters of commodity hardware, in a reliable and fault-tolerant manner. In addition, this post describes the architectural components of MapReduce and lists the benefits of using MapReduce. MapReduce It is a software framework that enables you to write applications that process vast amounts of data, in-parallel on large clusters of commodity hardware … [Read more...] about Understanding the Hadoop MapReduce framework
CCA131 – Configure NameNode HA
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS High Availability Overview A single NameNode is a single point of failure in a Hadoop cluster. You can experience HDFS downtime from an unexpected NameNode Crash or planned maintenance of NameNode. Having a NameNode high availability setup avoids these single points of failures. HDFS High Availability uses a pair of NameNodes One Active and one Standby Clients only contact the Active … [Read more...] about CCA131 – Configure NameNode HA
CCA131 – Configuring HDFS snapshot Policy
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series What is HDFS Snapshot Policy You can create Snapshot Policies using Cloudera Manager for taking an automated snapshot of snapshottable paths on HDFS. The snapshot policies run at the time specified (hourly, daily, weekly etc) by the user. Before we can create a Snapshot policy we must allow snapshot on the HDFS directory first. Configuring a Snapshot Policy The following are the steps to create a snapshot … [Read more...] about CCA131 – Configuring HDFS snapshot Policy
CCA 131 – Create/restore a snapshot of an HDFS directory (Using Cloudera Manager)
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS Snapshot Directories in HDFS can be snapshotted, which means creating one or more point-in-time images, or snapshots, of the directory. Snapshots include subdirectories, and can even include the entire filesystem (be careful with this for obvious reasons). Snapshots can be used as backups or for auditing purposes. As changes to the filesystem are made, any change that would affect the snapshot is treated … [Read more...] about CCA 131 – Create/restore a snapshot of an HDFS directory (Using Cloudera Manager)
CCA131 – Create an HDFS user’s home directory
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series In the exam, you may be asked to create a home directory for an existing local user onto HDFS. You may further be asked to set a specific ownership or permission to the home directory. The process basically involves: Create a local user if not already present Create a home directory into HDFS for the user Assign appropriate ownership and permissions to the home directory 1. Create a local user Please … [Read more...] about CCA131 – Create an HDFS user’s home directory
CCA 131 – Configure HDFS ACLs
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series The basis for Hadoop Access Control Lists is POSIX ACLs, available on the Linux filesystem. These ACLs allow you to link a set of permissions to a file or directory that is not limited to just one user and a group who owns the file. The HDFS ACLs give you a fine-grained file permissions model that is suitable for a large enterprise where the data stored on the Hadoop cluster should be accessible to some groups … [Read more...] about CCA 131 – Configure HDFS ACLs
CCA 131 – Rebalance the cluster
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series In a long-running cluster, there might be an unequal distribution of data across Datanodes. This could be due to failures of nodes or the addition of nodes to the cluster. To make sure that the data is equally distributed across Datanodes, it is important to use Hadoop balancer to redistribute the blocks. How rebalancing works - HDFS rebalancer reviews data block placement on nodes and adjusts the blocks to … [Read more...] about CCA 131 – Rebalance the cluster
CCA 131 – Commission/decommission a node
Note: This post is part of the CCA Administrator Exam (CCA131) objectives series Cloudera Manager makes it very simple to add and remove hosts in a cluster. All host management operations in Cloudera Manager are done from the Hosts screen. In this post, we will go through the steps of Commissioning and decommissioning a host in the CDH cluster. There will always be failures in clusters, such as hardware issues or a need to upgrade nodes. This should be done in a graceful manner, without … [Read more...] about CCA 131 – Commission/decommission a node