Transparent Hierarchical Storage Management with Apache Kudu and Impala

Posted 05 Mar 2019 by Grant Henke

Note: This is a cross-post from the Cloudera Engineering Blog Transparent Hierarchical Storage Management with Apache Kudu and Impala

When picking a storage option for an application it is common to pick a single storage option which has the most applicable features to your use case. For mutability and real-time analytics workloads you may want to use Apache Kudu, but for massive scalability at a low cost you may want to use HDFS. For that reason, there is a need for a solution that allows you to leverage the best features of multiple storage options. This post describes the sliding window pattern using Apache Impala with data stored in Apache Kudu and Apache HDFS. With this pattern you get all of the benefits of multiple storage layers in a way that is transparent to users.

Call for Posts

Posted 11 Dec 2018 by Attila Bukor

Most of the posts in the Kudu blog have been written by the project’s committers and are either technical or news-like in nature. We’d like to hear how you’re using Kudu in production, in testing, or in your hobby project and we’d like to share it with the world!

Apache Kudu 1.8.0 Released

Posted 26 Oct 2018 by Attila Bukor

The Apache Kudu team is happy to announce the release of Kudu 1.8.0!

The new release adds several new features and improvements, including the following:

Index Skip Scan Optimization in Kudu

Posted 26 Sep 2018 by Anupama Gupta

This summer I got the opportunity to intern with the Apache Kudu team at Cloudera. My project was to optimize the Kudu scan path by implementing a technique called index skip scan (a.k.a. scan-to-seek, see section 4.1 in [1]). I wanted to share my experience and the progress we’ve made so far on the approach.

Simplified Data Pipelines with Kudu

Posted 11 Sep 2018 by Mac Noland

I’ve been working with Hadoop now for over seven years and fortunately, or unfortunately, have run across a lot of structured data use cases. What we, at phData, have found is that end users are typically comfortable with tabular data and prefer to access their data in a structured manner using tables.