@InterfaceAudience.Public @InterfaceStability.Evolving public class KuduTableInputFormat extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult> implements org.apache.hadoop.conf.Configurable
This input format generates one split per tablet and the only location for each split is that tablet's leader.
 Hadoop doesn't have the concept of "closing" the input format so in order to release the
 resources (mainly, the Kudu client) we assume that once either
 getSplits(org.apache.hadoop.mapreduce.JobContext)
 or KuduTableInputFormat.TableRecordReader.close()
 have been called that the object won't be used again and the AsyncKuduClient is shut down.
 To prevent a premature shutdown of the client, the KuduTableInputFormat and the
 TableRecordReader both get their own client that they don't share.
 
 Default behavior of hadoop is to call getSplits(org.apache.hadoop.mapreduce.JobContext)
 in the MRAppMaster and for each inputSplit (in our case, Kudu tablet) will spawn one Mapper
 with a TableRecordReader reading one Tablet.
 Therefore, total number of Kudu clients opened over the course of a MR application can be
 estimated by (#Tablets +1). To reduce the number of concurrent open clients, it might be
 advisable to restrict resources of the MR application or implement the
 org.apache.hadoop.mapred.lib.CombineFileInputFormat over this InputFormat.
 
| Constructor and Description | 
|---|
| KuduTableInputFormat() | 
| Modifier and Type | Method and Description | 
|---|---|
| org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,RowResult> | createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                  org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) | 
| org.apache.hadoop.conf.Configuration | getConf() | 
| List<org.apache.hadoop.mapreduce.InputSplit> | getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) | 
| void | setConf(org.apache.hadoop.conf.Configuration entries) | 
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException, InterruptedException
getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>IOExceptionInterruptedExceptionpublic org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,RowResult> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,RowResult>IOExceptionInterruptedExceptionpublic void setConf(org.apache.hadoop.conf.Configuration entries)
setConf in interface org.apache.hadoop.conf.Configurablepublic org.apache.hadoop.conf.Configuration getConf()
getConf in interface org.apache.hadoop.conf.ConfigurableCopyright © 2018 The Apache Software Foundation. All rights reserved.