Apache Kudu 1.10.0 Release Notes

Upgrade Notes

  • The default tablet history retention time has been raised from 15 minutes to 7 days to better support touchless incremental backups (see KUDU-2677).

New features

  • Kudu now supports both full and incremental table backups via a job implemented using Apache Spark. Additionally it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark. See the backup documentation for more details.

  • Kudu can now synchronize its internal catalog with the Apache Hive Metastore, automatically updating Hive Metastore table entries upon table creation, deletion, and alterations in Kudu. See the HMS synchronization documentation for more details.

  • Kudu now supports native fine-grained authorization via integration with Apache Sentry. Kudu may now enforce access control policies defined for Kudu tables and columns, as well as policies defined on Hive servers and databases that may store Kudu tables. See the authorization documentation for more details.

  • Kudu’s web UI now supports SPNEGO, a protocol for securing HTTP requests with Kerberos by passing negotiation through HTTP headers. To enable, set the --webserver_require_spnego command line flag.

  • Column comments can now be stored in Kudu tables, and can be updated using the AlterTable API (see KUDU-1711).

  • The Java scan token builder can now create multiple tokens per tablet. To use this functionality, call setSplitSizeBytes() to specify how many bytes of data each token should scan. The same API is also available in Kudu’s Spark integration, where it can be used to spawn multiple Spark tasks per scanned tablet (see KUDU-2670).

  • Experimental Kudu Docker images are now published on Docker Hub.

  • Kudu now has an experimental Kubernetes StatefulSet manifest and Helm chart, which can be used to define and provision Kudu clusters using Kubernetes (see KUDU-2398).

  • The Kudu CLI now has rudimentary YAML-based configuration file support, which can be used to provide cluster connection information via cluster name instead of keying in comma-separated lists of master addresses. See the cluster name documentation for more details.

  • kudu perf table_scan scans a table and displays a table’s row count as well as the time it took to run the scan.

  • kudu table copy copies data from one table to another, within the same cluster or across clusters. Note, this implementation leverages a single client, therefore it may not be suitable for large tables.

  • Tablet history retention time can now be configured on a table-by-table basis. (see KUDU-2514).

Optimizations and improvements

  • The performance of mutations (i.e. UPDATE, DELETE, and re-INSERT) to not-yet-flushed Kudu data has been significantly optimized (see KUDU-2826 and f9f9526d3).

  • Predicate performance for primitive columns has been optimized (see KUDU-2846).

  • IS NULL and IS NOT NULL predicate performance has been optimized (see KUDU-2846).

  • Optimized the performance of fetching tablet locations from the master for tables with large numbers of partitions. This can improve the performance of short-running Spark or Impala queries as well as user applications which make use of short-lived client instances (see KUDU-2711).

  • The tableExists() (Java) and TableExists() (C++) APIs are now more performant (see KUDU-2802).

  • Fault tolerant scans are now much more performant and consume far less memory (see KUDU-2466).

  • kudu cluster ksck now sends more requests in parallel, which should result in a speed-up when running against clusters with many tables or when there’s high latency between the node running the CLI and the cluster nodes.

  • Kudu’s block manager now deletes spent block containers when needed instead of just at server startup. This should reduce server startup times somewhat (see KUDU-2636).

  • DNS resolutions are now cached by Kudu masters, tablet servers, and Kudu C++ clients. The TTL for a resolved DNS entry in the cache is 15 seconds by default (see KUDU-2791).

  • Tables created in Kudu 1.10.0 or later will show their creation time as well as their last alteration time in the web UI (see KUDU-2750).

  • The Kudu CLI and C++ client now support overriding the local username using the ‘KUDU_USER_NAME’ environment variable. This allows operating against a Kudu cluster using an identity which differs from the local Unix user on the client. Note that this has no effect on secure clusters, where client identity is determined by Kerberos authentication (see KUDU-2717).

  • Kudu C++ client now performs stricter verification on the input data of INSERT and UPSERT operations w.r.t. table schema constraints. This helps spotting schema violations before sending the data to a tablet server.

  • The KuduScanner in the Java client is now iterable. Additionally the KuduScannerIterator will automatically make scanner keep alive calls to ensure scanners do not time out while iterating.

  • A KuduPartitioner API was added to the Java client. The KuduPartitioner API allows a client to determine which partition a row falls into without actually writing that row. For example, the KuduPartitioner is used in the Spark integration to optionally repartition and pre-sort the data before writing to Kudu (see KUDU-2674 and KUDU-2672).

  • The PartialRow and RowResult Java API have new methods that accept and return Java Objects. These methods are useful when you don’t care about autoboxing and your existing type handling logic is based on Java types. See the javadoc for more details.

  • The Kudu Java client now logs RPC trace summaries instead of full RPC traces when the log level is INFO or higher. This reduces log noise and makes RPC issues more visible in a more compact format (see KUDU-2830).

  • Kudu servers now display the time at which they were started in their web UIs.

  • Kudu tablet servers now display a table’s total column count in the web UI.

  • The /metrics web UI endpoint now supports filtering on entity types, entity IDs, entity attributes, and metric names. This can be used to more efficiently collect important metrics when there is a large number of tablets on a tablet server.

  • The Kudu rebalancer now accepts the --ignored_tservers command line argument, which can be used to ignore the health status of specific tablet servers (i.e. if they are down) when deciding whether or not it’s safe to rebalance the cluster.

  • kudu master list now displays the Raft consensus role of each master in the cluster (i.e. LEADER or FOLLOWER) (see KUDU-2825).

  • kudu table scan no longer interleaves its output, and now projects all columns without having to manually list the column names.

  • kudu perf loadgen now supports creating empty tables. The semantics of the special value of 0 for --num_rows_per_thread flag has changed. A value of 0 now indicates that no rows should be generated, and a value of -1 indicates there should be no limit to the number of rows generated.

  • Running make install after building Kudu from source will now install the Kudu binaries into appropriate locations. (see KUDU-1344).

Fixed Issues

  • Fixed an issue where the Java client would fail scans that took a very long time to return a single block of rows, such as highly selective scans over a large amount of data (see KUDU-1868).

  • Fixed the handling of SERVICE_UNAVAILABLE errors that caused the Java client to do unnecessary master lookups.

  • Kudu scan tokens now work correctly when the target table is renamed between when the scan token is created and when it is rehydrated into a scanner.

  • Kudu’s “NTP synchronization wait” behavior at startup now works properly when Kudu is run in a containerized environment.

  • Fixed a crash when a flush or compaction overlapped with another compaction (see KUDU-2807).

  • Fixed a rare race at startup where the leader master would fruitlessly try to tablet copy to a healthy follower master, causing the cluster to operate as if it had two masters until master leadership changed (see KUDU-2748).

  • Under rare circumstances, it was possible for Kudu to crash in libkrb5 when negotiating multiple TLS connections concurrently. This crash has been fixed (see KUDU-2706).

  • Kudu no longer crashes at startup on machines with disabled CPUs (see KUDU-2721).

Wire Protocol compatibility

Kudu 1.10.0 is wire-compatible with previous versions of Kudu:

  • Kudu 1.10 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.

  • Rolling upgrade between Kudu 1.9 and Kudu 1.10 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.

  • Kudu 1.0 clients may connect to servers running Kudu 1.10 with the exception of the below-mentioned restrictions regarding secure clusters.

The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.10 and versions earlier than 1.3:

  • If a Kudu 1.10 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.

  • If a Kudu 1.10 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.

Incompatible Changes in Kudu 1.10.0

  • Support for building and running with Java 7 has been dropped in this release. It had been deprecated since Kudu 1.5.0. (see KUDU-2099).

Client Library Compatibility

  • The Kudu 1.10 Java client library is API- and ABI-compatible with Kudu 1.9. Applications written against Kudu 1.9 will compile and run against the Kudu 1.10 client library and vice-versa.

  • The Kudu 1.10 C++ client is API- and ABI-forward-compatible with Kudu 1.9. Applications written and compiled against the Kudu 1.9 client library will run without modification against the Kudu 1.10 client library. Applications written and compiled against the Kudu 1.10 client library will run without modification against the Kudu 1.9 client library.

  • The Kudu 1.10 Python client is API-compatible with Kudu 1.9. Applications written against Kudu 1.9 will continue to run against the Kudu 1.10 client and vice-versa.

Known Issues and Limitations

Please refer to the Known Issues and Limitations section of the documentation.

Contributors

Kudu 1.10 includes contributions from 27 people, including 6 first-time contributors:

  • Csaba Fulop

  • Florentino Sainz

  • Guangchao Deng

  • Jia Hongchao

  • Ye Yuqiang

  • Yifan Zhang

Thank you for your help in making Kudu even better!

Installation Options

For full installation details, see Kudu Installation.