Hadoop Beginner's Guide
上QQ阅读APP看书,第一时间看更新

Hadoop-specific data types

Up to this point we've glossed over the actual data types used as the input and output of the map and reduce classes. Let's take a look at them now.

The Writable and WritableComparable interfaces

If you browse the Hadoop API for the org.apache.hadoop.io package, you'll see some familiar classes such as Text and IntWritable along with others with the Writable suffix.

This package also contains the Writable interface specified as follows:

import java.io.DataInput ;
import java.io.DataOutput ;
import java.io.IOException ;

public interface Writable
{
void write(DataOutput out) throws IOException ;
void readFields(DataInput in) throws IOException ;
}

The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Every data type to be used as a value input or output from a mapper or reducer (that is, V1, V2, or V3) must implement this interface.

Data to be used as keys (K1, K2, K3) has a stricter requirement: in addition to Writable,it must also provide an implementation of the standard Java Comparable interface. This has the following specifications:

public interface Comparable
{
public int compareTO( Object obj) ;
}

The compare method returns -1, 0, or 1 depending on whether the compared object is less than, equal to, or greater than the current object.

As a convenience interface, Hadoop provides the WritableComparable interface in the org.apache.hadoop.io package.

public interface WritableComparable extends Writable, Comparable
{}

Introducing the wrapper classes

Fortunately, you don't have to start from scratch; as you've already seen, Hadoop provides classes that wrap the Java primitive types and implement WritableComparable. They are provided in the org.apache.hadoop.io package.

Primitive wrapper classes

These classes are conceptually similar to the primitive wrapper classes, such as Integer and Long found in java.lang. They hold a single primitive value that can be set either at construction or via a setter method.

  • BooleanWritable
  • ByteWritable
  • DoubleWritable
  • FloatWritable
  • IntWritable
  • LongWritable
  • VIntWritable – a variable length integer type
  • VLongWritable – a variable length long type

Array wrapper classes

These classes provide writable wrappers for arrays of other Writable objects. For example, an instance of either could hold an array of IntWritable or DoubleWritable but not arrays of the raw int or float types. A specific subclass for the required Writable class will be required. They are as follows:

  • ArrayWritable
  • TwoDArrayWritable

Map wrapper classes

These classes allow implementations of the java.util.Map interface to be used as keys or values. Note that they are defined as Map<Writable, Writable> and effectively manage a degree of internal-runtime-type checking. This does mean that compile type checking is weakened, so be careful.

  • AbstractMapWritable: This is a base class for other concrete Writable map implementations
  • MapWritable: This is a general purpose map mapping Writable keys to Writable values
  • SortedMapWritable: This is a specialization of the MapWritable class that also implements the SortedMap interface