Kudu C++ client API
Loading...
Searching...
No Matches
KuduScanner Class Reference

This class is a representation of a single scan. More...

#include <client.h>

Public Types

enum  ReadMode { READ_LATEST , READ_AT_SNAPSHOT , READ_YOUR_WRITES }
 The read modes for scanners. More...
enum  OrderMode { UNORDERED , ORDERED }
enum  { kScanTimeoutMillis = 30000 }

Public Member Functions

 KuduScanner (KuduTable *table)
Status SetProjectedColumnNames (const std::vector< std::string > &col_names) WARN_UNUSED_RESULT
Status SetProjectedColumnIndexes (const std::vector< int > &col_indexes) WARN_UNUSED_RESULT
Status SetProjectedColumns (const std::vector< std::string > &col_names) WARN_UNUSED_RESULT
Status SetQueryId (const std::string &query_id)
Status AddConjunctPredicate (KuduPredicate *pred) WARN_UNUSED_RESULT
Status AddLowerBound (const KuduPartialRow &key)
Status AddLowerBoundRaw (const Slice &key)
Status AddExclusiveUpperBound (const KuduPartialRow &key)
Status AddExclusiveUpperBoundRaw (const Slice &key)
Status AddLowerBoundPartitionKeyRaw (const Slice &partition_key)
Status AddExclusiveUpperBoundPartitionKeyRaw (const Slice &partition_key)
Status SetCacheBlocks (bool cache_blocks)
Status Open ()
Status KeepAlive ()
Status StartKeepAlivePeriodically (uint64_t keep_alive_interval_ms=30000)
void StopKeepAlivePeriodically ()
void Close ()
bool HasMoreRows () const
Status NextBatch (std::vector< KuduRowResult > *rows)
Status NextBatch (KuduScanBatch *batch)
Status NextBatch (KuduColumnarScanBatch *batch)
Status GetCurrentServer (KuduTabletServer **server)
const ResourceMetrics & GetResourceMetrics () const
Status SetBatchSizeBytes (uint32_t batch_size)
Status SetSelection (KuduClient::ReplicaSelection selection) WARN_UNUSED_RESULT
Status SetReadMode (ReadMode read_mode) WARN_UNUSED_RESULT
Status SetOrderMode (OrderMode order_mode) WARN_UNUSED_RESULT
Status SetFaultTolerant () WARN_UNUSED_RESULT
Status SetSnapshotMicros (uint64_t snapshot_timestamp_micros) WARN_UNUSED_RESULT
Status SetSnapshotRaw (uint64_t snapshot_timestamp) WARN_UNUSED_RESULT
Status SetTimeoutMillis (int millis)
KuduSchema GetProjectionSchema () const
sp::shared_ptr< KuduTableGetKuduTable ()
Status SetLimit (int64_t limit) WARN_UNUSED_RESULT
std::string ToString () const

Advanced/Unstable API

Modifier flags for the row format returned from the server.

Note
Each flag corresponds to a bit that gets set on a bitset that is sent to the server. See SetRowFormatFlags() for example usage.
static const uint64_t NO_FLAGS = 0
 No flags set.
static const uint64_t PAD_UNIXTIME_MICROS_TO_16_BYTES = 1 << 0
static const uint64_t COLUMNAR_LAYOUT = 1 << 1
Status SetRowFormatFlags (uint64_t flags)

Detailed Description

This class is a representation of a single scan.

Note
This class is not thread-safe, though different scanners on different threads may share a single KuduTable object.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum

Default scanner timeout. This is set to 3x the default RPC timeout returned by KuduClientBuilder::default_rpc_timeout().

◆ OrderMode

Whether the rows should be returned in order.

This affects the fault-tolerance properties of a scanner.

Enumerator
UNORDERED 

Rows will be returned in an arbitrary order determined by the tablet server. This is efficient, but unordered scans are not fault-tolerant and cannot be resumed in the case of tablet server failure.

This is the default mode.

ORDERED 

Rows will be returned ordered by primary key. Sorting the rows imposes additional overhead on the tablet server, but means that scans are fault-tolerant and will be resumed at another tablet server in the case of a failure.

◆ ReadMode

The read modes for scanners.

Enumerator
READ_LATEST 

When READ_LATEST is specified the server will always return committed writes at the time the request was received. This type of read does not return a snapshot timestamp and is not repeatable.

In ACID terms this corresponds to Isolation mode: "Read Committed"

This is the default mode.

READ_AT_SNAPSHOT 

When READ_AT_SNAPSHOT is specified the server will attempt to perform a read at the provided timestamp. If no timestamp is provided the server will take the current time as the snapshot timestamp. In this mode reads are repeatable, i.e. all future reads at the same timestamp will yield the same data. This is performed at the expense of waiting for in-flight ops whose timestamp is lower than the snapshot's timestamp to complete, so it might incur a latency penalty. See KuduScanner::SetSnapshotMicros() and KuduScanner::SetSnapshotRaw() for details.

In ACID terms this, by itself, corresponds to Isolation mode "Repeatable Read". If all writes to the scanned tablet are made externally consistent, then this corresponds to Isolation mode "Strict-Serializable".

Note
There are currently "holes", which happen in rare edge conditions, by which writes are sometimes not externally consistent even when action was taken to make them so. In these cases Isolation may degenerate to mode "Read Committed". See KUDU-430.
READ_YOUR_WRITES 

When READ_YOUR_WRITES is specified, the client will perform a read such that it follows all previously known writes and reads from this client. Specifically this mode: (1) ensures read-your-writes and read-your-reads session guarantees, (2) minimizes latency caused by waiting for outstanding write ops to complete.

Reads in this mode are not repeatable: two READ_YOUR_WRITES reads, even if they provide the same propagated timestamp bound, can execute at different timestamps and thus return different results.

Constructor & Destructor Documentation

◆ KuduScanner()

kudu::client::KuduScanner::KuduScanner ( KuduTable * table)
explicit

Constructor for KuduScanner.

Parameters
[in]tableThe table to perfrom scan. The given object must remain valid for the lifetime of this scanner object.

Member Function Documentation

◆ AddConjunctPredicate()

Status kudu::client::KuduScanner::AddConjunctPredicate ( KuduPredicate * pred)

Add a predicate for the scan.

Parameters
[in]predPredicate to set. The KuduScanTokenBuilder instance takes ownership of the parameter even if a bad Status is returned. Multiple calls of this method make the specified set of predicates work in conjunction, i.e. all predicates must be true for a row to be returned.
Returns
Operation result status.

◆ AddExclusiveUpperBound()

Status kudu::client::KuduScanner::AddExclusiveUpperBound ( const KuduPartialRow & key)

Add an upper bound (exclusive) primary key for the scan.

If any bound is already added, this bound is intersected with that one.

Parameters
[in]keyThe key to setup the upper bound. The scanner makes a copy of the parameter, the caller may free it afterward.
Returns
Operation result status.

◆ AddExclusiveUpperBoundPartitionKeyRaw()

Status kudu::client::KuduScanner::AddExclusiveUpperBoundPartitionKeyRaw ( const Slice & partition_key)

Add an upper bound (exclusive) partition key for the scan.

Note
This method is unstable, and for internal use only.
Parameters
[in]partition_keyThe scanner makes a copy of the parameter, the caller may invalidate it afterward.
Returns
Operation result status.

◆ AddExclusiveUpperBoundRaw()

Status kudu::client::KuduScanner::AddExclusiveUpperBoundRaw ( const Slice & key)

Add an upper bound (exclusive) primary key for the scan.

Deprecated
Use AddExclusiveUpperBound() instead.
Parameters
[in]keyThe encoded primary key is an opaque slice of data.
Returns
Operation result status.

◆ AddLowerBound()

Status kudu::client::KuduScanner::AddLowerBound ( const KuduPartialRow & key)

Add a lower bound (inclusive) primary key for the scan.

If any bound is already added, this bound is intersected with that one.

Parameters
[in]keyLower bound primary key to add. The KuduScanTokenBuilder instance does not take ownership of the parameter.
Returns
Operation result status.

◆ AddLowerBoundPartitionKeyRaw()

Status kudu::client::KuduScanner::AddLowerBoundPartitionKeyRaw ( const Slice & partition_key)

Add a lower bound (inclusive) partition key for the scan.

Note
This method is unstable, and for internal use only.
Parameters
[in]partition_keyThe scanner makes a copy of the parameter: the caller may invalidate it afterward.
Returns
Operation result status.

◆ AddLowerBoundRaw()

Status kudu::client::KuduScanner::AddLowerBoundRaw ( const Slice & key)

Add lower bound for the scan.

Deprecated
Use AddLowerBound() instead.
Parameters
[in]keyThe primary key to use as an opaque slice of data.
Returns
Operation result status.

◆ Close()

void kudu::client::KuduScanner::Close ( )

Close the scanner.

Closing the scanner releases resources on the server. This call does not block, and will not ever fail, even if the server cannot be contacted.

Note
The scanner is reset to its initial state by this function. You'll have to re-add any projection, predicates, etc if you want to reuse this object.

◆ GetCurrentServer()

Status kudu::client::KuduScanner::GetCurrentServer ( KuduTabletServer ** server)

Get the KuduTabletServer that is currently handling the scan.

More concretely, this is the server that handled the most recent Open() or NextBatch() RPC made by the server.

Parameters
[out]serverPlaceholder for the result.
Returns
Operation result status.

◆ GetKuduTable()

sp::shared_ptr< KuduTable > kudu::client::KuduScanner::GetKuduTable ( )
Returns
KuduTable being scanned.

◆ GetProjectionSchema()

KuduSchema kudu::client::KuduScanner::GetProjectionSchema ( ) const
Returns
Schema of the projection being scanned.

◆ GetResourceMetrics()

const ResourceMetrics & kudu::client::KuduScanner::GetResourceMetrics ( ) const
Returns
Cumulative resource metrics since the scan was started.

◆ HasMoreRows()

bool kudu::client::KuduScanner::HasMoreRows ( ) const

Check if there may be rows to be fetched from this scanner.

Returns
true if there may be rows to be fetched from this scanner. The method returns true provided there's at least one more tablet left to scan, even if that tablet has no data (we'll only know once we scan it). It will also be true after the initially opening the scanner before NextBatch is called for the first time.

◆ KeepAlive()

Status kudu::client::KuduScanner::KeepAlive ( )

Keep the current remote scanner alive.

Keep the current remote scanner alive on the Tablet server for an additional time-to-live. This is useful if the interval in between NextBatch() calls is big enough that the remote scanner might be garbage collected. The scanner time-to-live can be configured on the tablet server via the –scanner_ttl_ms configuration flag and has a default of 60 seconds.

This does not invalidate any previously fetched results.

Returns
Operation result status. In particular, this method returns a non-OK status if the scanner was already garbage collected or if the TabletServer was unreachable, for any reason. Note that a non-OK status returned by this method should not be taken as indication that the scan has failed. Subsequent calls to NextBatch() might still be successful, particularly if the scanner is configured to be fault tolerant.

◆ NextBatch() [1/3]

Status kudu::client::KuduScanner::NextBatch ( KuduColumnarScanBatch * batch)

Fetch the next batch of columnar results for this scanner.

This variant may only be used when the scan is configured with the COLUMNAR_LAYOUT RowFormatFlag.

A single KuduColumnarScanBatch object may be reused. Each subsequent call replaces the data from the previous call, and invalidates any Slice objects previously obtained from the batch.

Parameters
[out]batchPlaceholder for the result.
Returns
Operation result status.

◆ NextBatch() [2/3]

Status kudu::client::KuduScanner::NextBatch ( KuduScanBatch * batch)

Fetch the next batch of results for this scanner.

This variant may not be used when the scan is configured with the COLUMNAR_LAYOUT RowFormatFlag.

A single KuduScanBatch object may be reused. Each subsequent call replaces the data from the previous call, and invalidates any KuduScanBatch::RowPtr objects previously obtained from the batch.

Parameters
[out]batchPlaceholder for the result.
Returns
Operation result status.

◆ NextBatch() [3/3]

Status kudu::client::KuduScanner::NextBatch ( std::vector< KuduRowResult > * rows)

Get next batch of rows.

Clears 'rows' and populates it with the next batch of rows from the tablet server. A call to NextBatch() invalidates all previously fetched results which might now be pointing to garbage memory.

Deprecated
Use NextBatch(KuduScanBatch*) instead.
Parameters
[out]rowsPlaceholder for the result.
Returns
Operation result status.

◆ Open()

Status kudu::client::KuduScanner::Open ( )
Returns
Result status of the operation (begin scanning).

◆ SetBatchSizeBytes()

Status kudu::client::KuduScanner::SetBatchSizeBytes ( uint32_t batch_size)

Set the hint for the size of the next batch in bytes.

Parameters
[in]batch_sizeThe hint of batch size to set. If setting to 0 before calling Open(), it means that the first call to the tablet server won't return data.
Returns
Operation result status.

◆ SetCacheBlocks()

Status kudu::client::KuduScanner::SetCacheBlocks ( bool cache_blocks)

Set the block caching policy.

Parameters
[in]cache_blocksIf true, scanned data blocks will be cached in memory and made available for future scans. Default is true.
Returns
Operation result status.

◆ SetFaultTolerant()

Status kudu::client::KuduScanner::SetFaultTolerant ( )

Make scans resumable at another tablet server if current server fails.

Scans are by default non fault-tolerant, and scans will fail if scanning an individual tablet fails (for example, if a tablet server crashes in the middle of a tablet scan). If this method is called, scans will be resumed at another tablet server in the case of failure.

Fault-tolerant scans typically have lower throughput than non fault-tolerant scans. Fault tolerant scans use READ_AT_SNAPSHOT mode: if no snapshot timestamp is provided, the server will pick one.

Returns
Operation result status.

◆ SetLimit()

Status kudu::client::KuduScanner::SetLimit ( int64_t limit)

Set the maximum number of rows the scanner should return.

Parameters
[in]limitLimit on the number of rows to return.
Returns
Operation result status.

◆ SetOrderMode()

Status kudu::client::KuduScanner::SetOrderMode ( OrderMode order_mode)
Deprecated
Use SetFaultTolerant() instead.
Parameters
[in]order_modeResult record ordering mode to set.
Returns
Operation result status.

◆ SetProjectedColumnIndexes()

Status kudu::client::KuduScanner::SetProjectedColumnIndexes ( const std::vector< int > & col_indexes)

Set the column projection by passing the column indexes to read.

Set the column projection used for this scanner by passing the column indices to read. A call to this method overrides any previous call to SetProjectedColumnNames() or SetProjectedColumnIndexes().

Parameters
[in]col_indexesColumn indices for the projection.
Returns
Operation result status.

◆ SetProjectedColumnNames()

Status kudu::client::KuduScanner::SetProjectedColumnNames ( const std::vector< std::string > & col_names)

Set the projection for the scanner using column names.

Set the projection used for the scanner by passing column names to read. This overrides any previous call to SetProjectedColumnNames() or SetProjectedColumnIndexes().

Parameters
[in]col_namesColumn names to use for the projection.
Returns
Operation result status.

◆ SetProjectedColumns()

Status kudu::client::KuduScanner::SetProjectedColumns ( const std::vector< std::string > & col_names)
Deprecated
Use SetProjectedColumnNames() instead.
Parameters
[in]col_namesColumn names to use for the projection.
Returns
Operation result status.

◆ SetQueryId()

Status kudu::client::KuduScanner::SetQueryId ( const std::string & query_id)

Set a query id for the scan to trace the whole scanning process. Query id is posted by the user or generated automatically by the client library code. It is used to trace the whole query process for debugging.

Example usage:

KuduScanner scanner(...);
scanner.SetQueryId(query_id);
scanner.Open();
while (scanner.HasMoreRows()) {
scanner.NextBatch(&batch);
}
KuduScanner(KuduTable *table)
A batch of zero or more rows returned by a scan operation.
Definition scan_batch.h:84
Parameters
[in]query_idA query id to identify a query.
Returns
Operation result status.

◆ SetReadMode()

Status kudu::client::KuduScanner::SetReadMode ( ReadMode read_mode)

Set the ReadMode. Default is READ_LATEST.

Parameters
[in]read_modeRead mode to set.
Returns
Operation result status.

◆ SetRowFormatFlags()

Status kudu::client::KuduScanner::SetRowFormatFlags ( uint64_t flags)

Optionally set row format modifier flags.

If flags is RowFormatFlags::NO_FLAGS, then no modifications will be made to the row format and the default will be used.

Some flags require server-side server-side support, thus the caller should be prepared to handle a NotSupported status in Open() and NextBatch().

Example usage (without error handling, for brevity):

KuduScanner scanner(...);
uint64_t row_format_flags = KuduScanner::NO_FLAGS;
scanner.SetRowFormatFlags(row_format_flags);
scanner.Open();
while (scanner.HasMoreRows()) {
scanner.NextBatch(&batch);
Slice direct_data = batch.direct_data();
Slice indirect_data = batch.indirect_data();
... // Row data decoding and handling.
}
static const uint64_t PAD_UNIXTIME_MICROS_TO_16_BYTES
Definition client.h:3185
static const uint64_t NO_FLAGS
No flags set.
Definition client.h:3178
Parameters
[in]flagsRow format modifier flags to set.
Returns
Operation result status.

◆ SetSelection()

Status kudu::client::KuduScanner::SetSelection ( KuduClient::ReplicaSelection selection)

Set the replica selection policy while scanning.

Parameters
[in]selectionThe policy to set.
Returns
Operation result status.
Todo
Kill this method in favor of a consistency-level-based API.

◆ SetSnapshotMicros()

Status kudu::client::KuduScanner::SetSnapshotMicros ( uint64_t snapshot_timestamp_micros)

Set snapshot timestamp for scans in READ_AT_SNAPSHOT mode.

Parameters
[in]snapshot_timestamp_microsTimestamp to set in in microseconds since the Epoch.
Returns
Operation result status.

◆ SetSnapshotRaw()

Status kudu::client::KuduScanner::SetSnapshotRaw ( uint64_t snapshot_timestamp)

Set snapshot timestamp for scans in READ_AT_SNAPSHOT mode (raw).

Note
This method is experimental and will either disappear or change in a future release.
Parameters
[in]snapshot_timestampTimestamp to set in raw encoded form (i.e. as returned by a previous call to a server).
Returns
Operation result status.

◆ SetTimeoutMillis()

Status kudu::client::KuduScanner::SetTimeoutMillis ( int millis)

Set the maximum time that Open() and NextBatch() are allowed to take.

Parameters
[in]millisTimeout to set (in milliseconds). Must be greater than 0.
Returns
Operation result status.

◆ StartKeepAlivePeriodically()

Status kudu::client::KuduScanner::StartKeepAlivePeriodically ( uint64_t keep_alive_interval_ms = 30000)

Keep the current remote scanner alive by sending keep-alive requests periodically.

This function uses a timer to call KeepAlive() periodically which is defined by parameter keep_alive_interval_ms. It sends keep-alive requests to the server periodically using a separate thread. This is useful if the client takes long time to handle the fetched data before having the chance to call KeepAlive(). This can be called after the scanner is opened and the timer can be stopped by calling StopKeepAlivePeriodically().

Note
This method isn't thread-safe.
Parameters
[in]keep_alive_interval_msThe interval to send keep alive request. The default value is 30000 ms, which is half of the default setting for the –scanner_ttl_ms scanner.
Returns
It returns a non-OK if the scanner is not opened.

◆ StopKeepAlivePeriodically()

void kudu::client::KuduScanner::StopKeepAlivePeriodically ( )

Stop keeping the current remote scanner alive periodically.

This function stops to send keep-alive requests to the server periodically. After function StartKeepAlivePeriodically is called, this function can be used to stop the keep-alive timer at any time. The timer will be stopped automatically after finishing scanning. But it can also be stopped manually by calling this function.

◆ ToString()

std::string kudu::client::KuduScanner::ToString ( ) const
Returns
String representation of this scan.

Member Data Documentation

◆ COLUMNAR_LAYOUT

const uint64_t kudu::client::KuduScanner::COLUMNAR_LAYOUT = 1 << 1
static

Enable column-oriented data transfer. The server will transfer data to the client in a columnar format rather than a row-wise format. The KuduColumnarScanBatch API must be used to fetch results from this scan.

NOTE: older versions of the Kudu server do not support this feature. Clients aiming to support compatibility with previous versions should have a fallback code path.

◆ PAD_UNIXTIME_MICROS_TO_16_BYTES

const uint64_t kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES = 1 << 0
static
Note
This flag actually wastes throughput by making messages larger than they need to be. It exists merely for compatibility reasons and requires the user to know the row format in order to decode the data. That is, if this flag is enabled, the user must use KuduScanBatch::direct_data() and KuduScanBatch::indirect_data() to obtain the row data for further decoding. Using KuduScanBatch::Row() might yield incorrect/corrupt results and might even cause the client to crash.

The documentation for this class was generated from the following file: