$ find /usr/include/kudu -type f -name *.h
Kudu provides C++ and Java client APIs, as well as reference examples to illustrate their use. A Python API is included, but it is currently considered experimental, unstable, and subject to change at any time.
|Use of server-side or private interfaces is not supported, and interfaces which are not part of public APIs have no stability guarantees.|
The documentation for the C++ client APIs is included in the header files in
/usr/include/kudu/ if you installed Kudu using packages or subdirectories
src/kudu/client/ if you built Kudu from source. If you installed Kudu using parcels,
no headers are included in your installation. and you will need to build
Kudu from source in order to have access to the headers and shared libraries.
The following command is a naive approach to finding relevant header files. Use of any APIs other than the client APIs is unsupported.
$ find /usr/include/kudu -type f -name *.h
Several example applications are provided in the
repository. Each example includes a
README that shows how to compile and run
it. These examples illustrate correct usage of the Kudu APIs, as well as how to
set up a virtual machine to run Kudu. The following list includes some of the
examples that are available today. Check the repository itself in case this list goes
out of date.
A simple Java application which connects to a Kudu instance, creates a table, writes data to it, then drops the table.
A small Java application which listens on a TCP socket for time series data corresponding to the Collectl wire protocol. The commonly-available collectl tool can be used to send example data to the server.
An experimental Python client for Kudu.
Scripts to download and run a VirtualBox virtual machine with Kudu already installed. See Quickstart for more information.
These examples should serve as helpful starting points for your own Kudu applications and integrations.
The following Maven
<dependency> element is valid for the Kudu public beta:
<dependency> <groupId>org.apache.kudu</groupId> <artifactId>kudu-client</artifactId> <version>0.5.0</version> </dependency>
Because the Maven artifacts are not in Maven Central, use the following
<repository> <id>cdh.repo</id> <name>Cloudera Repositories</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <snapshots> <enabled>false</enabled> </snapshots> </repository>
See subdirectories of https://github.com/cloudera/kudu-examples/tree/master/java for example Maven pom.xml files.
See Using Impala With Kudu for guidance on installing
and using Impala with Kudu, including several
Kudu integrates with Spark through the Data Source API as of version 0.9. Include the kudu-spark jar using the --jars option:
spark-shell --jars kudu-spark-0.9.0.jar
then import kudu-spark and create a dataframe:
import org.apache.kudu.spark.kudu._ // Read a table from Kudu val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051","kudu.table" -> "kudu_table")).kudu // Query using the Spark API... df.select("id").filter("id" >= 5).show() // ...or register a temporary table and use SQL df.registerTempTable("kudu_table") val filteredDF = sqlContext.sql("select id from kudu_table where id >= 5").show() // Use KuduContext to create, delete, or write to Kudu tables val kuduContext = new KuduContext("kudu.master:7051") // Create a new Kudu table from a dataframe schema // NB: No rows from the dataframe are inserted into the table kuduContext.createTable("test_table", df.schema, Seq("key"), new CreateTableOptions().setNumReplicas(1)) // Insert data kuduContext.insertRows(df, "test_table") // Delete data kuduContext.deleteRows(filteredDF, "test_table") // Upsert data kuduContext.upsertRows(df, "test_table") // Update data val alteredDF = df.select("id", $"count" + 1) kuduContext.updateRows(filteredRows, "test_table" // Data can also be inserted into the Kudu table using the data source, though the methods on KuduContext are preferred // NB: The default is to upsert rows; to perform standard inserts instead, set operation = insert in the options map // NB: Only mode Append is supported df.write.options(Map("kudu.master"-> "kudu.master:7051", "kudu.table"-> "test_table")).mode("append").kudu // Check for the existence of a Kudu table kuduContext.tableExists("another_table") // Delete a Kudu table kuduContext.deleteTable("unwanted_table")
The Kudu Spark integration is tested and developed against Spark 1.6 and Scala 2.10.
Kudu tables with a name containing upper case or non-ascii characters must be assigned an alternate name when registered as a temporary table.
Kudu tables with a column name containing upper case or non-ascii characters may not be used with SparkSQL. Non-primary key columns may be renamed in Kudu to work around this issue.
IN predicates are not pushed to
Kudu, and instead will be evaluated by the Spark task.
Kudu does not support all types supported by Spark SQL, such as
Decimal and complex types.