Hadoop

Converting Many Small Files To A Sequence File In HDFS

Having a lot of small files in HDFS is not efficient for processing and also not good for NameNode metadata.…

How to run Hadoop without using SSH

The start-all.sh and stop-all.sh scripts in the hadoop/bin directory will use SSH to launch some of the Hadoop daemons. If…

How To Modify Hadoop Log Level

By default, Hadoop's log level is set to INFO. This can be too much for most instances, as it will…

Understanding the Hadoop MapReduce framework

This post introduces the MapReduce framework that enables you to write applications that process vast amounts of data, in parallel,…

CCA131 – Configure NameNode HA

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS High Availability Overview A single NameNode…

CCA131 – Configuring HDFS snapshot Policy

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series What is HDFS Snapshot Policy You can…

CCA 131 – Create/restore a snapshot of an HDFS directory (Using Cloudera Manager)

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series HDFS Snapshot Directories in HDFS can be…

CCA131 – Create an HDFS user’s home directory

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series In the exam, you may be asked…

CCA 131 – Configure HDFS ACLs

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series The basis for Hadoop Access Control Lists…

CCA 131 – Rebalance the cluster

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series In a long-running cluster, there might be…