The default tablet history retention time has been raised from 15 minutes to 7 days to better support touchless incremental backups (see KUDU-2677).
Kudu now supports both full and incremental table backups via a job implemented using Apache Spark. Additionally it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark. See the backup documentation for more details.
Kudu can now synchronize its internal catalog with the Apache Hive Metastore, automatically updating Hive Metastore table entries upon table creation, deletion, and alterations in Kudu. See the HMS synchronization documentation for more details.
Kudu now supports native fine-grained authorization via integration with Apache Sentry. Kudu may now enforce access control policies defined for Kudu tables and columns, as well as policies defined on Hive servers and databases that may store Kudu tables. See the authorization documentation for more details.
Kudu’s web UI now supports SPNEGO, a protocol for securing HTTP requests with
Kerberos by passing negotiation through HTTP headers. To enable, set the
--webserver_require_spnego command line flag.
Column comments can now be stored in Kudu tables, and can be updated using the AlterTable API (see KUDU-1711).
The Java scan token builder can now create multiple tokens per tablet.
To use this functionality, call
setSplitSizeBytes() to specify how many bytes
of data each token should scan. The same API is also available in Kudu’s
Spark integration, where it can be used to spawn multiple Spark tasks per
Experimental Kudu Docker images are now published on Docker Hub.
Kudu now has an experimental Kubernetes StatefulSet manifest and Helm chart, which can be used to define and provision Kudu clusters using Kubernetes (see KUDU-2398).
The Kudu CLI now has rudimentary YAML-based configuration file support, which can be used to provide cluster connection information via cluster name instead of keying in comma-separated lists of master addresses. See the cluster name documentation for more details.
kudu perf table_scan scans a table and displays a table’s row count as well
as the time it took to run the scan.
kudu table copy copies data from one table to another, within the same
cluster or across clusters. Note, this implementation leverages a single client,
therefore it may not be suitable for large tables.
Tablet history retention time can now be configured on a table-by-table basis. (see KUDU-2514).
Predicate performance for primitive columns has been optimized (see KUDU-2846).
IS NULL and IS NOT NULL predicate performance has been optimized (see KUDU-2846).
Optimized the performance of fetching tablet locations from the master for tables with large numbers of partitions. This can improve the performance of short-running Spark or Impala queries as well as user applications which make use of short-lived client instances (see KUDU-2711).
tableExists() (Java) and
TableExists() (C++) APIs are now more performant
Fault tolerant scans are now much more performant and consume far less memory (see KUDU-2466).
kudu cluster ksck now sends more requests in parallel, which should result
in a speed-up when running against clusters with many tables or when there’s
high latency between the node running the CLI and the cluster nodes.
Kudu’s block manager now deletes spent block containers when needed instead of just at server startup. This should reduce server startup times somewhat (see KUDU-2636).
DNS resolutions are now cached by Kudu masters, tablet servers, and Kudu C++ clients. The TTL for a resolved DNS entry in the cache is 15 seconds by default (see KUDU-2791).
Tables created in Kudu 1.10.0 or later will show their creation time as well as their last alteration time in the web UI (see KUDU-2750).
The Kudu CLI and C++ client now support overriding the local username using the ‘KUDU_USER_NAME’ environment variable. This allows operating against a Kudu cluster using an identity which differs from the local Unix user on the client. Note that this has no effect on secure clusters, where client identity is determined by Kerberos authentication (see KUDU-2717).
Kudu C++ client now performs stricter verification on the input data of INSERT and UPSERT operations w.r.t. table schema constraints. This helps spotting schema violations before sending the data to a tablet server.
KuduScanner in the Java client is now iterable. Additionally the
KuduScannerIterator will automatically make scanner keep alive calls to
ensure scanners do not time out while iterating.
KuduPartitioner API was added to the Java client. The
API allows a client to determine which partition a row falls into without
actually writing that row. For example, the
KuduPartitioner is used in the
Spark integration to optionally repartition and pre-sort the data before
writing to Kudu
(see KUDU-2674 and
RowResult Java API have new methods that accept and return
Java Objects. These methods are useful when you don’t care about autoboxing
and your existing type handling logic is based on Java types. See the javadoc
for more details.
The Kudu Java client now logs RPC trace summaries instead of full RPC traces when
the log level is
INFO or higher. This reduces log noise and makes RPC issues
more visible in a more compact format
Kudu servers now display the time at which they were started in their web UIs.
Kudu tablet servers now display a table’s total column count in the web UI.
/metrics web UI endpoint now supports filtering on entity types,
entity IDs, entity attributes, and metric names. This can be used to more
efficiently collect important metrics when there is a large number of tablets
on a tablet server.
The Kudu rebalancer now accepts the
--ignored_tservers command line
argument, which can be used to ignore the health status of specific tablet
servers (i.e. if they are down) when deciding whether or not it’s safe to
rebalance the cluster.
kudu master list now displays the Raft consensus role of each master in the
cluster (i.e. LEADER or FOLLOWER)
kudu table scan no longer interleaves its output, and now projects all
columns without having to manually list the column names.
kudu perf loadgen now supports creating empty tables. The semantics of the
special value of 0 for
--num_rows_per_thread flag has changed. A value of 0
now indicates that no rows should be generated, and a value of -1 indicates
there should be no limit to the number of rows generated.
make install after building Kudu from source will now install the
Kudu binaries into appropriate locations.
Fixed an issue where the Java client would fail scans that took a very long time to return a single block of rows, such as highly selective scans over a large amount of data (see KUDU-1868).
Fixed the handling of SERVICE_UNAVAILABLE errors that caused the Java client to do unnecessary master lookups.
Kudu scan tokens now work correctly when the target table is renamed between when the scan token is created and when it is rehydrated into a scanner.
Kudu’s “NTP synchronization wait” behavior at startup now works properly when Kudu is run in a containerized environment.
Fixed a crash when a flush or compaction overlapped with another compaction (see KUDU-2807).
Fixed a rare race at startup where the leader master would fruitlessly try to tablet copy to a healthy follower master, causing the cluster to operate as if it had two masters until master leadership changed (see KUDU-2748).
Under rare circumstances, it was possible for Kudu to crash in libkrb5 when negotiating multiple TLS connections concurrently. This crash has been fixed (see KUDU-2706).
Kudu no longer crashes at startup on machines with disabled CPUs (see KUDU-2721).
Kudu 1.10.0 is wire-compatible with previous versions of Kudu:
Kudu 1.10 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
Rolling upgrade between Kudu 1.9 and Kudu 1.10 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
Kudu 1.0 clients may connect to servers running Kudu 1.10 with the exception of the below-mentioned restrictions regarding secure clusters.
The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.10 and versions earlier than 1.3:
If a Kudu 1.10 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
If a Kudu 1.10 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.
Support for building and running with Java 7 has been dropped in this release. It had been deprecated since Kudu 1.5.0. (see KUDU-2099).
The Kudu 1.10 Java client library is API- and ABI-compatible with Kudu 1.9. Applications written against Kudu 1.9 will compile and run against the Kudu 1.10 client library and vice-versa.
The Kudu 1.10 C++ client is API- and ABI-forward-compatible with Kudu 1.9. Applications written and compiled against the Kudu 1.9 client library will run without modification against the Kudu 1.10 client library. Applications written and compiled against the Kudu 1.10 client library will run without modification against the Kudu 1.9 client library.
The Kudu 1.10 Python client is API-compatible with Kudu 1.9. Applications written against Kudu 1.9 will continue to run against the Kudu 1.10 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.
Kudu 1.10 includes contributions from 27 people, including 6 first-time contributors:
Thank you for your help in making Kudu even better!