Kudu 1.3 adds support for strong authentication based on Kerberos. This optional feature allows users to authenticate themselves using Kerberos tickets, and also provides mutual authentication of servers using Kerberos credentials stored in keytabs. This feature is optional, but recommended for deployments requiring security.
Kudu 1.3 adds support for encryption of data on the network using Transport Layer Security (TLS). Kudu will now use TLS to encrypt all network traffic between clients and servers as well as any internal traffic among servers, with the exception of traffic determined to be within a localhost network connection. Encryption is enabled by default whenever it can be determined that both the client and server support the feature.
Kudu 1.3 adds coarse-grained service-level authorization of access to the cluster. The operator may set up lists of permitted users who may act as administrators and as clients of the cluster. Combined with the strong authentication feature described above, this can enable a secure environment for some use cases. Note that fine-grained access control (e.g. table-level or column-level) is not yet supported.
Kudu 1.3 adds a background task to tablet servers which removes historical versions of data which have fallen behind the configured data retention time. This reduces disk space usage in all workloads, but particularly in those with a higher volume of updates or upserts.
Kudu now incorporates Google Breakpad, a library which writes crash reports in the case of a server crash. These reports can be found within the configured log directory, and can be useful during bug diagnosis.
Kudu servers will now change the file permissions of data directories and contained
data files based on a new configuration flag
--umask. As a result, after upgrading,
permissions on disk may be more restrictive than in previous versions. The new default
configuration improves data security.
Kudu’s web UI will now redact strings which may include sensitive user data. For example, the monitoring page which shows in-progress scans no longer includes the scanner predicate values. The tracing and RPC diagnostics endpoints no longer include contents of RPCs which may include table data.
By default, Kudu now reserves 1% of each configured data volume as free space. If a volume is seen to have less than 1% of disk space free, Kudu will stop writing to that volume to avoid completely filling up the disk.
The default encoding for numeric columns (int, float, and double) has been changed
BIT_SHUFFLE. The default encoding for binary and string columns has been
DICT_ENCODING. Dictionary encoding automatically falls back to the old
PLAIN) when cardinality is too high to be effectively encoded.
These new defaults match the default behavior of other storage mechanisms such as Apache Parquet and are likely to perform better out of the box.
Kudu now uses
LZ4 compression when writing its Write Ahead Log (WAL). This improves
write performance and stability for many use cases.
Kudu now uses
LZ4 compression when writing delta files. This can improve both
read and write performance as well as save substantial disk usage, especially
for workloads involving a high number of updates or upserts containing compressible
The Kudu API now supports the ability to express
IS NULL and
IS NOT NULL predicates
on scanners. The Spark DataSource integration will take advantage of these new
predicates when possible.
Both C++ and Java clients have been optimized to prune partitions more effectively
when performing scans using the
IN (…) predicate.
The exception messages produced by the Java client are now truncated to a maximum length of 32KB.
Fixed a critical bug in which wrong results would be returned when evaluating
predicates applied to columns added using the
ALTER TABLE operation.
KUDU-1905 Fixed a crash after inserting a row sharing a primary key with a recently-deleted row in tables where the primary key is comprised of all of the columns.
KUDU-1899 Fixed a crash after inserting a row with an empty string as the single-column primary key.
KUDU-1904 Fixed a potential crash when performing random reads against a column using RLE encoding and containing long runs of NULL values.
KUDU-1853 Fixed an issue where disk space could be leaked on servers which experienced an error during the process of copying tablet data from another server.
KUDU-1856 Fixed an issue in which disk space could be leaked by Kudu servers storing data on partitions using the XFS file system. Any leaked disk space will be automatically recovered upon upgrade.
Kudu 1.3.0 is wire-compatible with previous versions of Kudu:
Kudu 1.3 clients may connect to servers running Kudu 1.0. If the client uses features that are not available on the target server, an error will be returned.
Kudu 1.0 clients may connect to servers running Kudu 1.3 with the exception of the below-mentioned restrictions regarding secure clusters.
Rolling upgrade between Kudu 1.2 and Kudu 1.3 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
The authentication features newly introduced in Kudu 1.3 place the following limitations on wire compatibility with older versions:
If a Kudu 1.3 cluster is configured with authentication or encryption set to "required", older clients will be unable to connect.
If a Kudu 1.3 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
Due to storage format changes in Kudu 1.3, downgrade from Kudu 1.3 to earlier versions is not supported. After upgrading to Kudu 1.3, attempting to restart with an earlier version will result in an error.
In order to support running MapReduce and Spark jobs on secure clusters, these frameworks now connect to the cluster at job submission time to retrieve authentication credentials which can later be used by the tasks to be spawned. This means that the process submitting jobs to Kudu clusters must have direct access to that cluster.
The embedded web servers in Kudu processes now specify the
X-Frame-Options: DENY HTTP
header which prevents embedding Kudu web pages in HTML
The Kudu 1.3 Java client library is API- and ABI-compatible with Kudu 1.2. Applications written against Kudu 1.2 will compile and run against the Kudu 1.3 client library and vice-versa, unless one of the following newly added APIs is used:
[Async]KuduClient.exportAuthenticationCredentials(…) (unstable API)
[Async]KuduClient.importAuthenticationCredentials(…) (unstable API)
The Kudu 1.3 C++ client is API- and ABI-forward-compatible with Kudu 1.2. Applications written and compiled against the Kudu 1.2 client library will run without modification against the Kudu 1.3 client library. Applications written and compiled against the Kudu 1.3 client library will run without modification against the Kudu 1.2 client library unless they use one of the following new APIs:
The Kudu 1.3 Python client is API-compatible with Kudu 1.2. Applications written against Kudu 1.2 will continue to run against the Kudu 1.3 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.