Apache Kudu 1.14.0 Released

Posted 28 Jan 2021 by Grant Henke

The Apache Kudu team is happy to announce the release of Kudu 1.14.0!

The new release adds several new features and improvements, including the following:

  • Full support for INSERT_IGNORE, UPDATE_IGNORE, and DELETE_IGNORE operations was added. The INSERT_IGNORE operation will insert a row if one matching the key does not exist and ignore the operation if one already exists. The UPDATE_IGNORE operation will update the row if one matching the key exists and ignore the operation if one does not exist. The DELETE_IGNORE operation will delete the row if one matching the key exists and ignore the operation if one does not exist. These operations are particularly useful in situations where retries or duplicate operations could occur and you do not want to handle the errors that could result manually or you do not want to cause unnecessary writes and compaction work as a result of using the UPSERT operation. The Java client can check if the cluster it is communicating with supports these operations by calling the supportsIgnoreOperations() method on the KuduClient.

  • Spark 3 compatible JARs compiled for Scala 2.12 are now published for the Kudu Spark integration. See link:https://issues.apache.org/jira/browse/KUDU-3202[KUDU-3202] for more details.

  • Every Kudu cluster now has an automatically generated cluster Id that can be used to uniquely identify a cluster. The cluster Id is shown in the masters web-UI, the kudu master list tool, and in master server logs.

  • Downloading the WAL data and data blocks when copying tablets to another tablet server is now parallelized, resulting in much faster tablet copy operations. These operations occur when recovering from a down tablet server or when running the cluster rebalancer.

  • The HMS integration now supports multiple Kudu clusters associated with a single HMS including Kudu clusters that do not have HMS synchronization enabled. This is possible, because the Kudu master will now leverage the cluster Id to ignore notifications from tables in a different cluster. Additionally, the HMS plugin will check if the Kudu cluster associated with a table has HMS synchronization enabled.

  • DeltaMemStores will now be flushed as long as any DMS in a tablet is older than the point defined by --flush_threshold_secs, rather than flushing once every --flush_threshold_secs period. This can reduce memory pressure under update- or delete-heavy workloads, and lower tablet server restart times following such workloads.

The above is just a list of the highlights, for a more complete list of new features, improvements and fixes please refer to the release notes.

The Apache Kudu project only publishes source code releases. To build Kudu 1.14.0, follow these steps:

For your convenience, binary JAR files for the Kudu Java client library, Spark DataSource, Flume sink, and other Java integrations are published to the ASF Maven repository and are now available.

The Python client source is also available on PyPI.

Additionally, experimental Docker images are published to Docker Hub, including for AArch64-based architectures (ARM).