KuduTableInputFormat (Kudu 1.8.0 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>
- - org.apache.kudu.mapreduce.KuduTableInputFormat

All Implemented Interfaces:

org.apache.hadoop.conf.Configurable
```
@InterfaceAudience.Public
 @InterfaceStability.Evolving
public class KuduTableInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>
implements org.apache.hadoop.conf.Configurable
```
This input format generates one split per tablet and the only location for each split is that tablet's leader.

Hadoop doesn't have the concept of "closing" the input format so in order to release the resources (mainly, the Kudu client) we assume that once either getSplits(org.apache.hadoop.mapreduce.JobContext) or KuduTableInputFormat.TableRecordReader.close() have been called that the object won't be used again and the AsyncKuduClient is shut down. To prevent a premature shutdown of the client, the KuduTableInputFormat and the TableRecordReader both get their own client that they don't share.

Default behavior of hadoop is to call getSplits(org.apache.hadoop.mapreduce.JobContext) in the MRAppMaster and for each inputSplit (in our case, Kudu tablet) will spawn one Mapper with a TableRecordReader reading one Tablet. Therefore, total number of Kudu clients opened over the course of a MR application can be estimated by (#Tablets +1). To reduce the number of concurrent open clients, it might be advisable to restrict resources of the MR application or implement the org.apache.hadoop.mapred.lib.CombineFileInputFormat over this InputFormat.

Constructor Summary

Constructors
Constructor and Description

KuduTableInputFormat()

Constructors
Constructor and Description
`KuduTableInputFormat()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,RowResult>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)`
`org.apache.hadoop.conf.Configuration`	`getConf()`
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)`
`void`	`setConf(org.apache.hadoop.conf.Configuration entries)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- KuduTableInputFormat
```
public KuduTableInputFormat()
```

Method Detail

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                       throws IOException,
                                                              InterruptedException

Specified by:: getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>
Throws:: IOException; InterruptedException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,RowResult> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                                                                org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                                                         throws IOException,
                                                                                                                InterruptedException

Specified by:: createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>
Throws:: IOException; InterruptedException

setConf
```
public void setConf(org.apache.hadoop.conf.Configuration entries)
```
Specified by:

setConf in interface org.apache.hadoop.conf.Configurable

getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```
Specified by:

getConf in interface org.apache.hadoop.conf.Configurable

Class KuduTableInputFormat

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

KuduTableInputFormat

Method Detail

getSplits

createRecordReader

setConf

getConf