Benchmarking and Improving Kudu Insert Performance with YCSB

Posted 26 Apr 2016 by Todd Lipcon

Recently, I wanted to stress-test and benchmark some changes to the Kudu RPC server, and decided to use YCSB as a way to generate reasonable load. While running YCSB, I noticed interesting results, and what started as an unrelated testing exercise eventually yielded some new insights into Kudu’s behavior. These insights will motivate changes to default Kudu settings and code in upcoming versions. This post details the benchmark setup, analysis, and conclusions.

Ingesting JSON Data Into Apache Kudu with StreamSets Data Collector

Posted 14 Apr 2016 by Pat Patterson

At the Hadoop Summit in Dublin this week, Ted Malaska, Principal Solutions Architect at Cloudera, and I presented Ingest and Stream Processing - What Will You Choose?, looking at the big data streaming landscape with a focus on ingest. The session closed with a demo of StreamSets Data Collector, the open source graphical IDE for building ingest pipelines.

In the demo, I built a pipeline to read JSON data from Apache Kafka, augmented the data in JavaScript, and wrote the resulting records to both Apache Kudu (incubating) for analysis and Apache Kafka for visualization.