Kudu C++ client API
Public Types | Public Member Functions | Friends | List of all members
kudu::client::KuduScanner Class Reference

This class is a representation of a single scan. More...

#include <client.h>

Public Types

enum  ReadMode { READ_LATEST, READ_AT_SNAPSHOT }
 The read modes for scanners. More...
 
enum  OrderMode { UNORDERED, ORDERED }
 
enum  { kScanTimeoutMillis = 30000 }
 

Public Member Functions

 KuduScanner (KuduTable *table)
 
Status SetProjectedColumnNames (const std::vector< std::string > &col_names) WARN_UNUSED_RESULT
 
Status SetProjectedColumnIndexes (const std::vector< int > &col_indexes) WARN_UNUSED_RESULT
 
Status SetProjectedColumns (const std::vector< std::string > &col_names) WARN_UNUSED_RESULT ATTRIBUTE_DEPRECATED("use SetProjectedColumnNames() instead")
 
Status AddConjunctPredicate (KuduPredicate *pred) WARN_UNUSED_RESULT
 
Status AddLowerBound (const KuduPartialRow &key)
 
Status AddLowerBoundRaw (const Slice &key) ATTRIBUTE_DEPRECATED("use AddLowerBound() instead")
 
Status AddExclusiveUpperBound (const KuduPartialRow &key)
 
Status AddExclusiveUpperBoundRaw (const Slice &key) ATTRIBUTE_DEPRECATED("use AddExclusiveUpperBound() instead")
 
Status AddLowerBoundPartitionKeyRaw (const Slice &partition_key)
 
Status AddExclusiveUpperBoundPartitionKeyRaw (const Slice &partition_key)
 
Status SetCacheBlocks (bool cache_blocks)
 
Status Open ()
 
Status KeepAlive ()
 
void Close ()
 
bool HasMoreRows () const
 
Status NextBatch (std::vector< KuduRowResult > *rows) ATTRIBUTE_DEPRECATED("use NextBatch(KuduScanBatch*) instead")
 
Status NextBatch (KuduScanBatch *batch)
 
Status GetCurrentServer (KuduTabletServer **server)
 
const ResourceMetricsGetResourceMetrics () const
 
Status SetBatchSizeBytes (uint32_t batch_size)
 
Status SetSelection (KuduClient::ReplicaSelection selection) WARN_UNUSED_RESULT
 
Status SetReadMode (ReadMode read_mode) WARN_UNUSED_RESULT
 
Status SetOrderMode (OrderMode order_mode) WARN_UNUSED_RESULT ATTRIBUTE_DEPRECATED("use SetFaultTolerant() instead")
 
Status SetFaultTolerant () WARN_UNUSED_RESULT
 
Status SetSnapshotMicros (uint64_t snapshot_timestamp_micros) WARN_UNUSED_RESULT
 
Status SetSnapshotRaw (uint64_t snapshot_timestamp) WARN_UNUSED_RESULT
 
Status SetTimeoutMillis (int millis)
 
KuduSchema GetProjectionSchema () const
 
std::string ToString () const
 

Friends

class KuduScanToken
 

Advanced/Unstable API

static const uint64_t NO_FLAGS = 0
 
static const uint64_t PAD_UNIXTIME_MICROS_TO_16_BYTES = 1 << 0
 
Status SetRowFormatFlags (uint64_t flags)
 

Detailed Description

This class is a representation of a single scan.

Note
This class is not thread-safe, though different scanners on different threads may share a single KuduTable object.

Member Enumeration Documentation

anonymous enum

Default scanner timeout. This is set to 3x the default RPC timeout returned by KuduClientBuilder::default_rpc_timeout().

Whether the rows should be returned in order.

This affects the fault-tolerance properties of a scanner.

Enumerator
UNORDERED 

Rows will be returned in an arbitrary order determined by the tablet server. This is efficient, but unordered scans are not fault-tolerant and cannot be resumed in the case of tablet server failure.

This is the default mode.

ORDERED 

Rows will be returned ordered by primary key. Sorting the rows imposes additional overhead on the tablet server, but means that scans are fault-tolerant and will be resumed at another tablet server in the case of a failure.

The read modes for scanners.

Enumerator
READ_LATEST 

When READ_LATEST is specified the server will always return committed writes at the time the request was received. This type of read does not return a snapshot timestamp and is not repeatable.

In ACID terms this corresponds to Isolation mode: "Read Committed"

This is the default mode.

READ_AT_SNAPSHOT 

When READ_AT_SNAPSHOT is specified the server will attempt to perform a read at the provided timestamp. If no timestamp is provided the server will take the current time as the snapshot timestamp. In this mode reads are repeatable, i.e. all future reads at the same timestamp will yield the same data. This is performed at the expense of waiting for in-flight transactions whose timestamp is lower than the snapshot's timestamp to complete, so it might incur a latency penalty. See KuduScanner::SetSnapshotMicros() and KuduScanner::SetSnapshotRaw() for details.

In ACID terms this, by itself, corresponds to Isolation mode "Repeatable Read". If all writes to the scanned tablet are made externally consistent, then this corresponds to Isolation mode "Strict-Serializable".

Note
There are currently "holes", which happen in rare edge conditions, by which writes are sometimes not externally consistent even when action was taken to make them so. In these cases Isolation may degenerate to mode "Read Committed". See KUDU-430.

Constructor & Destructor Documentation

kudu::client::KuduScanner::KuduScanner ( KuduTable table)
explicit

Constructor for KuduScanner.

Parameters
[in]tableThe table to perfrom scan. The given object must remain valid for the lifetime of this scanner object.

Member Function Documentation

Status kudu::client::KuduScanner::AddConjunctPredicate ( KuduPredicate pred)

Add a predicate for the scan.

Parameters
[in]predPredicate to set. The KuduScanTokenBuilder instance takes ownership of the parameter even if a bad Status is returned. Multiple calls of this method make the specified set of predicates work in conjunction, i.e. all predicates must be true for a row to be returned.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddExclusiveUpperBound ( const KuduPartialRow key)

Add an upper bound (exclusive) primary key for the scan.

If any bound is already added, this bound is intersected with that one.

Parameters
[in]keyThe key to setup the upper bound. The scanner makes a copy of the parameter, the caller may free it afterward.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddExclusiveUpperBoundPartitionKeyRaw ( const Slice partition_key)

Add an upper bound (exclusive) partition key for the scan.

Note
This method is unstable, and for internal use only.
Parameters
[in]partition_keyThe scanner makes a copy of the parameter, the caller may invalidate it afterward.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddExclusiveUpperBoundRaw ( const Slice key)

Add an upper bound (exclusive) primary key for the scan.

Deprecated:
Use AddExclusiveUpperBound() instead.
Parameters
[in]keyThe encoded primary key is an opaque slice of data.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddLowerBound ( const KuduPartialRow key)

Add a lower bound (inclusive) primary key for the scan.

If any bound is already added, this bound is intersected with that one.

Parameters
[in]keyLower bound primary key to add. The KuduScanTokenBuilder instance does not take ownership of the parameter.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddLowerBoundPartitionKeyRaw ( const Slice partition_key)

Add a lower bound (inclusive) partition key for the scan.

Note
This method is unstable, and for internal use only.
Parameters
[in]partition_keyThe scanner makes a copy of the parameter: the caller may invalidate it afterward.
Returns
Operation result status.
Status kudu::client::KuduScanner::AddLowerBoundRaw ( const Slice key)

Add lower bound for the scan.

Deprecated:
Use AddLowerBound() instead.
Parameters
[in]keyThe primary key to use as an opaque slice of data.
Returns
Operation result status.
void kudu::client::KuduScanner::Close ( )

Close the scanner.

Closing the scanner releases resources on the server. This call does not block, and will not ever fail, even if the server cannot be contacted.

Note
The scanner is reset to its initial state by this function. You'll have to re-add any projection, predicates, etc if you want to reuse this object.
Status kudu::client::KuduScanner::GetCurrentServer ( KuduTabletServer **  server)

Get the KuduTabletServer that is currently handling the scan.

More concretely, this is the server that handled the most recent Open() or NextBatch() RPC made by the server.

Parameters
[out]serverPlaceholder for the result.
Returns
Operation result status.
KuduSchema kudu::client::KuduScanner::GetProjectionSchema ( ) const
Returns
Schema of the projection being scanned.
const ResourceMetrics& kudu::client::KuduScanner::GetResourceMetrics ( ) const
Returns
Cumulative resource metrics since the scan was started.
bool kudu::client::KuduScanner::HasMoreRows ( ) const

Check if there may be rows to be fetched from this scanner.

Returns
true if there may be rows to be fetched from this scanner. The method returns true provided there's at least one more tablet left to scan, even if that tablet has no data (we'll only know once we scan it). It will also be true after the initially opening the scanner before NextBatch is called for the first time.
Status kudu::client::KuduScanner::KeepAlive ( )

Keep the current remote scanner alive.

Keep the current remote scanner alive on the Tablet server for an additional time-to-live (set by a configuration flag on the tablet server). This is useful if the interval in between NextBatch() calls is big enough that the remote scanner might be garbage collected (default TTL is set to 60 secs.). This does not invalidate any previously fetched results.

Returns
Operation result status. In particular, this method returns a non-OK status if the scanner was already garbage collected or if the TabletServer was unreachable, for any reason. Note that a non-OK status returned by this method should not be taken as indication that the scan has failed. Subsequent calls to NextBatch() might still be successful, particularly if SetFaultTolerant() has been called.
Status kudu::client::KuduScanner::NextBatch ( std::vector< KuduRowResult > *  rows)

Get next batch of rows.

Clears 'rows' and populates it with the next batch of rows from the tablet server. A call to NextBatch() invalidates all previously fetched results which might now be pointing to garbage memory.

Deprecated:
Use NextBatch(KuduScanBatch*) instead.
Parameters
[out]rowsPlaceholder for the result.
Returns
Operation result status.
Status kudu::client::KuduScanner::NextBatch ( KuduScanBatch batch)

Fetch the next batch of results for this scanner.

A single KuduScanBatch object may be reused. Each subsequent call replaces the data from the previous call, and invalidates any KuduScanBatch::RowPtr objects previously obtained from the batch.

Parameters
[out]batchPlaceholder for the result.
Returns
Operation result status.
Status kudu::client::KuduScanner::Open ( )
Returns
Result status of the operation (begin scanning).
Status kudu::client::KuduScanner::SetBatchSizeBytes ( uint32_t  batch_size)

Set the hint for the size of the next batch in bytes.

Parameters
[in]batch_sizeThe hint of batch size to set. If setting to 0 before calling Open(), it means that the first call to the tablet server won't return data.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetCacheBlocks ( bool  cache_blocks)

Set the block caching policy.

Parameters
[in]cache_blocksIf true, scanned data blocks will be cached in memory and made available for future scans. Default is true.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetFaultTolerant ( )

Make scans resumable at another tablet server if current server fails.

Scans are by default non fault-tolerant, and scans will fail if scanning an individual tablet fails (for example, if a tablet server crashes in the middle of a tablet scan). If this method is called, scans will be resumed at another tablet server in the case of failure.

Fault-tolerant scans typically have lower throughput than non fault-tolerant scans. Fault tolerant scans use READ_AT_SNAPSHOT mode: if no snapshot timestamp is provided, the server will pick one.

Returns
Operation result status.
Status kudu::client::KuduScanner::SetOrderMode ( OrderMode  order_mode)
Deprecated:
Use SetFaultTolerant() instead.
Parameters
[in]order_modeResult record ordering mode to set.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetProjectedColumnIndexes ( const std::vector< int > &  col_indexes)

Set the column projection by passing the column indexes to read.

Set the column projection used for this scanner by passing the column indices to read. A call to this method overrides any previous call to SetProjectedColumnNames() or SetProjectedColumnIndexes().

Parameters
[in]col_indexesColumn indices for the projection.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetProjectedColumnNames ( const std::vector< std::string > &  col_names)

Set the projection for the scanner using column names.

Set the projection used for the scanner by passing column names to read. This overrides any previous call to SetProjectedColumnNames() or SetProjectedColumnIndexes().

Parameters
[in]col_namesColumn names to use for the projection.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetProjectedColumns ( const std::vector< std::string > &  col_names)
Deprecated:
Use SetProjectedColumnNames() instead.
Parameters
[in]col_namesColumn names to use for the projection.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetReadMode ( ReadMode  read_mode)

Set the ReadMode. Default is READ_LATEST.

Parameters
[in]read_modeRead mode to set.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetRowFormatFlags ( uint64_t  flags)

Optionally set row format modifier flags.

If flags is RowFormatFlags::NO_FLAGS, then no modifications will be made to the row format and the default will be used.

Some flags require server-side server-side support, thus the caller should be prepared to handle a NotSupported status in Open() and NextBatch().

Example usage (without error handling, for brevity):

KuduScanner scanner(...);
uint64_t row_format_flags = KuduScanner::NO_FLAGS;
scanner.SetRowFormatFlags(row_format_flags);
scanner.Open();
while (scanner.HasMoreRows()) {
KuduScanBatch batch;
scanner.NextBatch(&batch);
Slice direct_data = batch.direct_data();
Slice indirect_data = batch.indirect_data();
... // Row data decoding and handling.
}
Status kudu::client::KuduScanner::SetSelection ( KuduClient::ReplicaSelection  selection)

Set the replica selection policy while scanning.

Parameters
[in]selectionThe policy to set.
Returns
Operation result status.
Todo:
Kill this method in favor of a consistency-level-based API.
Status kudu::client::KuduScanner::SetSnapshotMicros ( uint64_t  snapshot_timestamp_micros)

Set snapshot timestamp for scans in READ_AT_SNAPSHOT mode.

Parameters
[in]snapshot_timestamp_microsTimestamp to set in in microseconds since the Epoch.
Returns
Operation result status.
Status kudu::client::KuduScanner::SetSnapshotRaw ( uint64_t  snapshot_timestamp)

Set snapshot timestamp for scans in READ_AT_SNAPSHOT mode (raw).

See KuduClient::GetLatestObservedTimestamp() for details on how to use this method to achieve Read-Your-Writes behavior.

Note
This method is experimental and will either disappear or change in a future release.
Parameters
[in]snapshot_timestampTimestamp to set in raw encoded form (i.e. as returned by a previous call to a server).
Returns
Operation result status.
Status kudu::client::KuduScanner::SetTimeoutMillis ( int  millis)

Set the maximum time that Open() and NextBatch() are allowed to take.

Parameters
[in]millisTimeout to set (in milliseconds). Must be greater than 0.
Returns
Operation result status.
std::string kudu::client::KuduScanner::ToString ( ) const
Returns
String representation of this scan.

Member Data Documentation

const uint64_t kudu::client::KuduScanner::NO_FLAGS = 0
static

Modifier flags for the row format returned from the server.

Note
Each flag corresponds to a bit that gets set on a bitset that is sent to the server. See SetRowFormatFlags() for example usage.
const uint64_t kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES = 1 << 0
static

Makes the server pad UNIXTIME_MICROS slots to 16 bytes.

Note
This flag actually wastes throughput by making messages larger than they need to be. It exists merely for compatibility reasons and requires the user to know the row format in order to decode the data. That is, if this flag is enabled, the user must use KuduScanBatch::direct_data() and KuduScanBatch::indirect_data() to obtain the row data for further decoding. Using KuduScanBatch::Row() might yield incorrect/corrupt results and might even cause the client to crash.

The documentation for this class was generated from the following file: