Apache Kudu 1.9.0 Release Notes

Upgrade Notes

Flume 1.8+ requires Java 8 at runtime even though the Kudu Flume integration is Java 7 compatible. Flume 1.9 is the default dependency version as of Kudu 1.9.0.
Hadoop 3.0+ requires Java 8 at runtime even though the Kudu Hadoop integration is Java 7 compatible. Hadoop 3.2 is the default dependency version as of Kudu 1.9.0.

Obsoletions

Deprecations

Support for Java 7 has been deprecated since Kudu 1.5.0 and may be removed in the next major release.

New features

Kudu now supports location awareness. When configured, Kudu will make a best effort to avoid placing a majority of replicas for a given tablet at the same location. The kudu cluster rebalance tool has been updated to act in accordance with the placement policy of a location-aware Kudu. The administrative documentation has been updated to detail the usage of this feature.
Docker scripts have been introduced to build and run Kudu on various operating systems. See the /docker subdirectory of the source repository for more details. An official repository has been created for Apache Kudu Docker artifacts.
Developers integrating with Kudu can now write Java tests that start a Kudu mini cluster without having to first locally build and install Kudu. This is made possible by the Kudu team providing platform-specific binaries available to Gradle or Maven for download and install at test time. More information on this feature can be found here. This binary test artifact is currently considered to be experimental.

Optimizations and improvements

When creating a table, the master now enforces a restriction on the total number of replicas rather than the total number of partitions. If manually overriding --max_create_tablets_per_ts, the maximum size of a new table has effectively been cut by a factor of its replication factor. Note that partitions can still be added after table creation.
The compaction policy has been updated to favor reducing the number of rowsets. This can lead to faster scans and lower bootup times, particularly in the face of a “trickling inserts” workload, where rows are inserted slowly in primary key order (see KUDU-1400).
A tablet-level metric average_diskrowset_height has been added to indicate how much a replica needs to be compacted, as indicated by the average number of rowsets per unit of keyspace.
Scans which read multiple columns of tables undergoing a heavy UPDATE workload are now more CPU efficient. In some cases, scan performance of such tables may be several times faster upon upgrading to this release.
Kudu-Spark users can now provide the short “kudu” format alias to Spark. This enables using .format(“kudu”) in places where you would have needed to provide the fully qualified name like .format(“org.apache.kudu.spark.kudu") or imported org.apache.kudu.spark.kudu._ and used the implicit .kudu functions. The Spark integration documentation has been updated to reflect this improvement.
The KuduSink class has been added to the Spark integration as a StreamSinkProvider, allowing structured streaming writes into Kudu (see KUDU-2640).
The amount of server-side logging has been greatly reduced for Kudu’s consensus implementation and background processes. This logging was determined to be not useful and unnecessarily verbose.
The web UI now more obviously depicts which columns are a part of the primary key (see KUDU-2477).
The kudu table describe tool has been added to support describing table attributes, including schema, partitioning, replication factor, column encodings, compressions, and default values.
The kudu table scan tool has been added to scan rows from a table, supporting comparison, in-list, and is-null predicates.
The kudu locate_row tool has been added to allow users to determine what tablet a given primary key belongs to, and whether a row exists for that primary key.
The kudu diagnose dump_mem_trackers tool is added to allow users to output the contents of the /mem-trackers web UI page in a CSV format.

Fixed Issues

To avoid glitches and undefined behavior, the Kudu Python client now detects and reports on conflicting/incorrect initialization of the OpenSSL library.
Fixed a crash caused by a race between altering tablet schemas and deleting tablet replicas (see KUDU-1678).
Fixed an issue that would prevent the kudu fs update_dirs tool from removing directories in the presence of tablet tombstones (see KUDU-2680).
The --cmeta_force_fsync flag may be used to fsync Kudu’s consensus metadata more aggressively. Setting this to true may decrease Kudu’s performance, but improve its durability in the face of power failures and forced shutdowns (see KUDU-2195).
Fixed an issue that would cause an excessive amount of RPC traffic from Kudu masters if the tablet servers were configured with duplicated master addresses (see KUDU-2684).
Fixed an issue that would cause the kudu cluster rebalance tool to run indefinitely in the case of tables with a replication factor of 2 (see KUDU-2688).
Fixed an issue that could lead to a failure to bootstrap tablet replicas that were a part of workloads with many alter table operations (see KUDU-2690).
Fixed an issue with the Java scanner’s keepAlive that could lead to a permanent hang in the scanner (see KUDU-2710).
Fixed an issue that would cause undefined behavior upon connecting to a secure cluster concurrently from multiple C++ clients (see KUDU-2706).

Wire Protocol compatibility

Kudu 1.9.0 is wire-compatible with previous versions of Kudu:

Kudu 1.9 clients may connect to servers running Kudu 1.0 or later. If the client uses features that are not available on the target server, an error will be returned.
Rolling upgrade between Kudu 1.8 and Kudu 1.9 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
Kudu 1.0 clients may connect to servers running Kudu 1.9 with the exception of the below-mentioned restrictions regarding secure clusters.

The authentication features introduced in Kudu 1.3 place the following limitations on wire compatibility between Kudu 1.9 and versions earlier than 1.3:

If a Kudu 1.9 cluster is configured with authentication or encryption set to "required", clients older than Kudu 1.3 will be unable to connect.
If a Kudu 1.9 cluster is configured with authentication and encryption set to "optional" or "disabled", older clients will still be able to connect.

Incompatible Changes in Kudu 1.9.0

Client Library Compatibility

The Kudu 1.9 Java client library is API- and ABI-compatible with Kudu 1.8. Applications written against Kudu 1.8 will compile and run against the Kudu 1.9 client library and vice-versa.
The Kudu 1.9 C++ client is API- and ABI-forward-compatible with Kudu 1.8. Applications written and compiled against the Kudu 1.8 client library will run without modification against the Kudu 1.9 client library. Applications written and compiled against the Kudu 1.9 client library will run without modification against the Kudu 1.8 client library.
The Kudu 1.9 Python client is API-compatible with Kudu 1.8. Applications written against Kudu 1.8 will continue to run against the Kudu 1.9 client and vice-versa.

Known Issues and Limitations

Please refer to the Known Issues and Limitations section of the documentation.

Contributors

Kudu 1.9 includes contributions from 24 people, including 5 first-time contributors:

Bankim Bhavsar
Mike Parker
Mitch Barnett
Tim Armstrong
Yingchun Lai