Apache Kudu Weekly Update August 16th, 2016

Posted 16 Aug 2016 by Todd Lipcon

Welcome to the twentieth edition of the Kudu Weekly Update. This weekly blog post covers ongoing development and news in the Apache Kudu project.

Project news

  • The first release candidate for the 0.10.0 is now available

    Community developers and users are encouraged to download the source tarball and vote on the release.

    For information on what’s new, check out the release notes. Note: some links from these in-progress release notes will not be live until the release itself is published.

Development discussions and code in progress

  • Will Berkeley spent some time working on the Spark integration this week to add support for UPSERT as well as other operations. Dan Burkert pitched in a bit with some suggestions which were then integrated in a patch provided by Will.

    After some reviews by Dan, Chris George, and Ram Mettu, the patch was committed in time for the upcoming 0.10.0 release.

  • Dan Burkert also completed work for the new manual partitioning APIs in the Java client. After finishing up the basic implementation, Dan also made some cleanups to the related APIs in both the Java and C++ clients.

    Dan and Misty Stanley-Jones also collaborated to finish the documentation for this new feature.

  • Adar Dembo worked on some tooling to allow users to migrate their Kudu clusters from a single-master configuration to a multi-master one. Along the way, he started building some common infrastructure for command-line tooling.

    Since Kudu’s initial release, it has included separate binaries for different administrative or operational tools (e.g. kudu-ts-cli, kudu-ksck, kudu-fs_dump, log-dump, etc). Despite having similar usage, these tools don’t share much code, and the separate statically linked binaries make the Kudu packages take more disk space than strictly necessary.

    Adar’s work has introduced a new top-level kudu binary which exposes a set of subcommands, much like the git and docker binaries with which readers may be familiar. For example, a new tool he has built for dumping peer identifiers from a tablet’s consensus metadata is triggered using kudu tablet cmeta print_replica_uuids.

    This new tool will be available in the upcoming 0.10.0 release; however, migration of the existing tools to the new infrastructure has not yet been completed. We expect that by Kudu 1.0, the old tools will be removed in favor of more subcommands of the kudu tool.

  • Todd Lipcon picked up the work started by David Alves in July to provide “exactly-once” semantics for write operations. Todd carried the patch series through review and also completed integration of the feature into the Kudu server processes.

    After testing the feature for several days on a large cluster under load, the team decided to enable this new feature by default in Kudu 0.10.0.

  • Mike Percy resumed working on garbage collection of past versions of updated and deleted rows. His main patch for the feature went through several rounds of review and testing, but unfortunately missed the cut-off for 0.10.0.

  • Alexey Serbin’s work to add doxygen-based documentation for the C++ Client API was committed this week. These docs will be published as part of the 0.10.0 release.

  • Alexey also continued work on implementing the AUTO_FLUSH_BACKGROUND write mode for the C++ client. This feature makes it easier to implement high-throughput ingest using the C++ API by automatically handling the batching and flushing of writes based on a configurable buffer size.

    Alexey’s patch has received several rounds of review and looks likely to be committed soon. Detailed performance testing will follow.

  • Congratulations to Ram Mettu for committing his first patch to Kudu this week! Ram fixed a bug in handling Alter Table with TIMESTAMP columns.

Upcoming talks

Want to learn more about a specific topic from this blog post? Shoot an email to the kudu-user mailing list or tweet at @ApacheKudu. Similarly, if you’re aware of some Kudu news we missed, let us know so we can cover it in a future post.