Hadoop Beginner's Guide

上QQ阅读APP看书，第一时间看更新

Time for action – starting Hadoop

Unlike the local mode of Hadoop, where all the components run only for the lifetime of the submitted job, with the pseudo-distributed or fully distributed mode of Hadoop, the cluster components exist as long-running processes. Before we use HDFS or MapReduce, we need to start up the needed components. Type the following commands; the output should look as shown next, where the commands are included on the lines prefixed by $:

Type in the first command:

$ start-dfs.sh
starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-vm193.out
localhost: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-vm193.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-vm193.out

Type in the second command:

$ jps
9550 DataNode
9687 Jps
9638 SecondaryNameNode
9471 NameNode

Type in the third command:

$ hadoop dfs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2012-10-26 23:03 /tmp
drwxr-xr-x - hadoop supergroup 0 2012-10-26 23:06 /user

Type in the fourth command:

$ start-mapred.sh 
starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-vm193.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-vm193.out

Type in the fifth command:

$ jps
9550 DataNode
9877 TaskTracker
9638 SecondaryNameNode
9471 NameNode
9798 JobTracker
9913 Jps

What just happened?

The start-dfs.sh command, as the name suggests, starts the components necessary for HDFS. This is the NameNode to manage the filesystem and a single DataNode to hold data. The SecondaryNameNode is an availability aid that we'll discuss in a later chapter.

After starting these components, we use the JDK's jps utility to see which Java processes are running, and, as the output looks good, we then use Hadoop's dfs utility to list the root of the HDFS filesystem.

After this, we use start-mapred.sh to start the MapReduce components—this time the JobTracker and a single TaskTracker—and then use jps again to verify the result.

There is also a combined start-all.sh file that we'll use at a later stage, but in the early days it's useful to do a two-stage start up to more easily verify the cluster configuration.

本周热推：

AI 3.0 Linux常用命令简明手册计算机原理计算机网络 AI的25种可能