Apache Kudu 1.3.1 Release Notes

Apache Kudu 1.3.1 is a bug-fix release which fixes critical issues in Kudu 1.3.0.

  • KUDU-1962 Fixed a NullPointerException in the Java client in the case that the Kudu master is overloaded at the time the client requests location information. This could cause client applications to hang indefinitely regardless of configured timeouts.

  • KUDU-1963 Fixed cases in which the Java client could log NullPointerException or SSLException stack traces in the case that it was closed while in the process of connecting to a server.

  • KUDU-1607 Fixed a case in which a tablet replica on a tablet server could retain blocks of data which prevented it from being fully deleted.

  • KUDU-1933 Fixed an issue in which a tablet server would crash and fail to restart after a single tablet received more than two billion write operations. Upgrading to Kudu 1.3.1 or later will allow a server which previously experienced this issue to restart.

  • KUDU-1968 Fixed an issue in which the tablet server would delete an incorrect set of data blocks after an aborted attempt to copy a tablet from another server. This would produce data loss in unrelated tablets.

Apache Kudu 1.3.0 Release Notes

New features

  • Kudu 1.3 adds support for strong authentication based on Kerberos. This optional feature allows users to authenticate themselves using Kerberos tickets, and also provides mutual authentication of servers using Kerberos credentials stored in keytabs. This feature is optional, but recommended for deployments requiring security.

  • Kudu 1.3 adds support for encryption of data on the network using Transport Layer Security (TLS). Kudu will now use TLS to encrypt all network traffic between clients and servers as well as any internal traffic among servers, with the exception of traffic determined to be within a localhost network connection. Encryption is enabled by default whenever it can be determined that both the client and server support the feature.

  • Kudu 1.3 adds coarse-grained service-level authorization of access to the cluster. The operator may set up lists of permitted users who may act as administrators and as clients of the cluster. Combined with the strong authentication feature described above, this can enable a secure environment for some use cases. Note that fine-grained access control (e.g. table-level or column-level) is not yet supported.

  • Kudu 1.3 adds a background task to tablet servers which removes historical versions of data which have fallen behind the configured data retention time. This reduces disk space usage in all workloads, but particularly in those with a higher volume of updates or upserts.

  • Kudu now incorporates Google Breakpad, a library which writes crash reports in the case of a server crash. These reports can be found within the configured log directory, and can be useful during bug diagnosis.

Optimizations and improvements

  • Kudu servers will now change the file permissions of data directories and contained data files based on a new configuration flag --umask. As a result, after upgrading, permissions on disk may be more restrictive than in previous versions. The new default configuration improves data security.

  • Kudu’s web UI will now redact strings which may include sensitive user data. For example, the monitoring page which shows in-progress scans no longer includes the scanner predicate values. The tracing and RPC diagnostics endpoints no longer include contents of RPCs which may include table data.

  • By default, Kudu now reserves 1% of each configured data volume as free space. If a volume is seen to have less than 1% of disk space free, Kudu will stop writing to that volume to avoid completely filling up the disk.

  • The default encoding for numeric columns (int, float, and double) has been changed to BIT_SHUFFLE. The default encoding for binary and string columns has been changed to DICT_ENCODING. Dictionary encoding automatically falls back to the old default (PLAIN) when cardinality is too high to be effectively encoded.

    These new defaults match the default behavior of other storage mechanisms such as Apache Parquet and are likely to perform better out of the box.

  • Kudu now uses LZ4 compression when writing its Write Ahead Log (WAL). This improves write performance and stability for many use cases.

  • Kudu now uses LZ4 compression when writing delta files. This can improve both read and write performance as well as save substantial disk usage, especially for workloads involving a high number of updates or upserts containing compressible data.

  • The Kudu API now supports the ability to express IS NULL and IS NOT NULL predicates on scanners. The Spark DataSource integration will take advantage of these new predicates when possible.

  • Both C++ and Java clients have been optimized to prune partitions more effectively when performing scans using the IN (…​) predicate.

  • The exception messages produced by the Java client are now truncated to a maximum length of 32KB.

Fixed Issues

  • KUDU-1893 Fixed a critical bug in which wrong results would be returned when evaluating predicates applied to columns added using the ALTER TABLE operation.

  • KUDU-1905 Fixed a crash after inserting a row sharing a primary key with a recently-deleted row in tables where the primary key is comprised of all of the columns.

  • KUDU-1899 Fixed a crash after inserting a row with an empty string as the single-column primary key.

  • KUDU-1904 Fixed a potential crash when performing random reads against a column using RLE encoding and containing long runs of NULL values.

  • KUDU-1853 Fixed an issue where disk space could be leaked on servers which experienced an error during the process of copying tablet data from another server.

The fix for KUDU-1853 resulted in a regression and was reverted in Kudu 1.3.1.
  • KUDU-1856 Fixed an issue in which disk space could be leaked by Kudu servers storing data on partitions using the XFS file system. Any leaked disk space will be automatically recovered upon upgrade.

  • KUDU-1888, KUDU-1906 Fixed multiple issues in the Java client where operation callbacks would never be triggered, causing the client to hang.

Wire Protocol compatibility

Kudu 1.3.0 is wire-compatible with previous versions of Kudu:

  • Kudu 1.3 clients may connect to servers running Kudu 1.0. If the client uses features that are not available on the target server, an error will be returned.

  • Kudu 1.0 clients may connect to servers running Kudu 1.3 with the exception of the below-mentioned restrictions regarding secure clusters.

  • Rolling upgrade between Kudu 1.2 and Kudu 1.3 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.

The authentication features newly introduced in Kudu 1.3 place the following limitations on wire compatibility with older versions:

  • If a Kudu 1.3 cluster is configured with authentication or encryption set to "required", older clients will be unable to connect.

  • If a Kudu 1.3 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.

Incompatible Changes in Kudu 1.3.0

  • Due to storage format changes in Kudu 1.3, downgrade from Kudu 1.3 to earlier versions is not supported. After upgrading to Kudu 1.3, attempting to restart with an earlier version will result in an error.

  • In order to support running MapReduce and Spark jobs on secure clusters, these frameworks now connect to the cluster at job submission time to retrieve authentication credentials which can later be used by the tasks to be spawned. This means that the process submitting jobs to Kudu clusters must have direct access to that cluster.

  • The embedded web servers in Kudu processes now specify the X-Frame-Options: DENY HTTP header which prevents embedding Kudu web pages in HTML iframe elements.

Client Library Compatibility

  • The Kudu 1.3 Java client library is API- and ABI-compatible with Kudu 1.2. Applications written against Kudu 1.2 will compile and run against the Kudu 1.3 client library and vice-versa, unless one of the following newly added APIs is used:

    • [Async]KuduClient.exportAuthenticationCredentials(…​) (unstable API)

    • [Async]KuduClient.importAuthenticationCredentials(…​) (unstable API)

    • [Async]KuduClient.getMasterAddressesAsString()

    • KuduPredicate.newIsNotNullPredicate()

    • KuduPredicate.newIsNullPredicate()

  • The Kudu 1.3 C++ client is API- and ABI-forward-compatible with Kudu 1.2. Applications written and compiled against the Kudu 1.2 client library will run without modification against the Kudu 1.3 client library. Applications written and compiled against the Kudu 1.3 client library will run without modification against the Kudu 1.2 client library unless they use one of the following new APIs:

    • kudu::DisableOpenSSLInitialization()

    • KuduClientBuilder::import_authentication_credentials(…​)

    • KuduClient::ExportAuthenticationCredentials(…​)

    • KuduClient::NewIsNotNullPredicate(…​)

    • KuduClient::NewIsNullPredicate(…​)

  • The Kudu 1.3 Python client is API-compatible with Kudu 1.2. Applications written against Kudu 1.2 will continue to run against the Kudu 1.3 client and vice-versa.

Known Issues and Limitations

Please refer to the Known Issues and Limitations section of the documentation.

Installation Options

For full installation details, see Kudu Installation.