In this tutorial I will show some common commands for HDFS operations.
If you don't have Hadoop setup in your linux, you can follow Hadoop Setup Guide

Log into Linux, "hduser" is the login used in following examples.

Start Hadoop If it's not running

$ start-dfs.sh
....
$ start-yarn.sh

Create someFile.txt in your home directory

hduser@ubuntu:~$ vi someFile.txt

Paste any text you want in to the file and save it.

Create Home Directory In HDFS (If it doesn't exist)

hduser@ubuntu:~$ hadoop fs -mkdir -p /user/hduser

Copy file someFile.txt from local disk to the user’s directory in HDFS.

hduser@ubuntu:~$ hadoop fs -copyFromLocal someFile.txt someFile.txt

Get a directory listing of the user’s home directory in HDFS

hduser@ubuntu:~$ hadoop fs –ls


Found 1 items
-rw-r--r--   1 hduser supergroup          5 2013-10-27 17:57 someFile.txt

Display the contents of the HDFS file /user/hduser/someFile.txt

hduser@ubuntu:~$ hadoop fs –cat /user/hduser/someFile.txt

Get a directory listing of the HDFS root directory

hduser@ubuntu:~$ hadoop fs –ls /

copy that file to the local disk, named as someFile2.txt

hduser@ubuntu:~$ hadoop fs –copyToLocal /user/hduser/someFile.txt someFile2.txt

Delete the file from hadoop hdfs

hduser@ubuntu:~$ hadoop fs –rm someFile.txt

Deleted someFile.txt

For a full list of commands, Please visit HDFS FileSystem Shell Commands. Please feel free to leave me any comments or suggestions.

There are so many version of WordCount hadoop example flowing around the web. However, a lot of them are using the older version of hadoop api. Following are example of word count using the newest hadoop map reduce api. The new map reduce api reside in org.apache.hadoop.mapreduce package instead of org.apache.hadoop.mapred.

WordMapper.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
 private Text word = new Text();
 private final static IntWritable one = new IntWritable(1);
 
 @Override
 public void map(Object key, Text value,
   Context contex) throws IOException, InterruptedException {
  // Break line into words for processing
  StringTokenizer wordList = new StringTokenizer(value.toString());
  while (wordList.hasMoreTokens()) {
   word.set(wordList.nextToken());
   contex.write(word, one);
  }
 }
}

SumReducer.java

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;



public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
 
 private IntWritable totalWordCount = new IntWritable();
 
 @Override
 public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
  int wordCount = 0;
  Iterator<IntWritable> it=values.iterator();
  while (it.hasNext()) {
   wordCount += it.next().get();
  }
  totalWordCount.set(wordCount);
  context.write(key, totalWordCount);
 }
}

WordCount.java (Driver)

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
 public static void main(String[] args) throws Exception {
        if (args.length != 2) {
          System.out.println("usage: [input] [output]");
          System.exit(-1);
        }
  
  
        Job job = Job.getInstance(new Configuration());
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(WordMapper.class); 
        job.setReducerClass(SumReducer.class);  

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setJarByClass(WordCount.class);

        job.submit();
        
        
        
  

  
 }
}

Codes Fusion

Sunday, October 27, 2013

Hadoop FileSystem (HDFS) Tutorial 1

Friday, October 18, 2013

Hadoop WordCount with new map reduce api

WordMapper.java

SumReducer.java

WordCount.java (Driver)

About Me