
Hadoop-specific data types
Up to this point we've glossed over the actual data types used as the input and output of the map and reduce classes. Let's take a look at them now.
The Writable and WritableComparable interfaces
If you browse the Hadoop API for the org.apache.hadoop.io
package, you'll see some familiar classes such as Text
and IntWritable
along with others with the Writable
suffix.
This package also contains the Writable
interface specified as follows:
import java.io.DataInput ; import java.io.DataOutput ; import java.io.IOException ; public interface Writable { void write(DataOutput out) throws IOException ; void readFields(DataInput in) throws IOException ; }
The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Every data type to be used as a value input or output from a mapper or reducer (that is, V1
, V2
, or V3
) must implement this interface.
Data to be used as keys (K1
, K2
, K3
) has a stricter requirement: in addition to Writable
,it must also provide an implementation of the standard Java Comparable
interface. This has the following specifications:
public interface Comparable { public int compareTO( Object obj) ; }
The compare method returns -1
, 0
, or 1
depending on whether the compared object is less than, equal to, or greater than the current object.
As a convenience interface, Hadoop provides the WritableComparable
interface in the org.apache.hadoop.io
package.
public interface WritableComparable extends Writable, Comparable {}
Introducing the wrapper classes
Fortunately, you don't have to start from scratch; as you've already seen, Hadoop provides classes that wrap the Java primitive types and implement WritableComparable
. They are provided in the org.apache.hadoop.io
package.
These classes are conceptually similar to the primitive wrapper classes, such as Integer
and Long
found in java.lang
. They hold a single primitive value that can be set either at construction or via a setter method.
- BooleanWritable
- ByteWritable
- DoubleWritable
- FloatWritable
- IntWritable
- LongWritable
- VIntWritable – a variable length integer type
- VLongWritable – a variable length long type
These classes provide writable wrappers for arrays of other Writable
objects. For example, an instance of either could hold an array of IntWritable
or DoubleWritable
but not arrays of the raw int or float types. A specific subclass for the required Writable
class will be required. They are as follows:
- ArrayWritable
- TwoDArrayWritable
These classes allow implementations of the java.util.Map
interface to be used as keys or values. Note that they are defined as Map<Writable, Writable>
and effectively manage a degree of internal-runtime-type checking. This does mean that compile type checking is weakened, so be careful.
AbstractMapWritable
: This is a base class for other concreteWritable
map implementationsMapWritable
: This is a general purpose map mappingWritable
keys toWritable
valuesSortedMapWritable
: This is a specialization of theMapWritable
class that also implements theSortedMap
interface