Kudu clients and servers now redact user data such as cell values
from log messages, Java exception messages, and
User metadata such as table names, column names, and partition
bounds are not redacted.
Redaction is enabled by default, but may be disabled by setting the new
log_redact_user_data flag to
Kudu’s ability to provide consistency guarantees has been substantially improved:
Replicas now correctly track their "safe timestamp". This timestamp is the maximum timestamp at which reads are guaranteed to be repeatable.
A scan created using the
SCAN_AT_SNAPSHOT mode will now
either wait for the requested snapshot to be "safe" at the replica
being scanned, or be re-routed to a replica where the requested
snapshot is "safe". This ensures that all such scans are repeatable.
Kudu Tablet Servers now properly retain historical data when a row with a given primary key is inserted and deleted, followed by the insertion of a new row with the same key. Previous versions of Kudu would not retain history in such situations. This allows the server to return correct results for snapshot scans with a timestamp in the past, even in the presence of such "reinsertion" scenarios.
The Kudu clients now automatically retain the timestamp of their latest
successful read or write operation. Scans using the
without a client-provided timestamp automatically assign a timestamp
higher than the timestamp of their most recent write. Writes also propagate
the timestamp, ensuring that sequences of operations with causal dependencies
between them are assigned increasing timestamps. Together, these changes
allow clients to achieve read-your-writes consistency, and also ensure
that snapshot scans performed by other clients return causally-consistent
Kudu servers now automatically limit the number of log files.
The number of log files retained can be configured using the
max_log_files flag. By default, 10 log files will be retained
at each severity level.
The logging in the Java and C++ clients has been substantially quieted. Clients no longer log messages in normal operation unless there is some kind of error.
The C++ client now includes a
API which can limit the amount of memory used to buffer
errors from asynchronous operations.
The Java client now fetches tablet locations from the Kudu Master in batches of 1000, increased from batches of 10 in prior versions. This can substantially improve the performance of Spark and Impala queries running against Kudu tables with large numbers of tablets.
Table metadata lock contention in the Kudu Master was substantially reduced. This improves the performance of tablet location lookups on large clusters with a high degree of concurrency.
Lock contention in the Kudu Tablet Server during high-concurrency write workloads was also reduced. This can reduce CPU consumption and improve performance when a large number of concurrent clients are writing to a smaller number of a servers.
Lock contention when writing log messages has been substantially reduced. This source of contention could cause high tail latencies on requests, and when under high load could contribute to cluster instability such as election storms and request timeouts.
BITSHUFFLE column encoding has been optimized to use the
instruction set present on processors including Intel® Sandy Bridge
and later. Scans on
BITSHUFFLE-encoded columns are now up to 30% faster.
kudu tool now accepts hyphens as an alternative to underscores
when specifying actions. For example,
kudu local-replica copy-from-remote
may be used as an alternative to
kudu local_replica copy_from_remote.
Fixed a long-standing issue in which running Kudu on
ext4 file systems
could cause file system corruption.
Implemented an LRU cache for open files, which prevents running out of
file descriptors on long-lived Kudu clusters. By default, Kudu will
limit its file descriptor usage to half of its configured
Fixed an issue which caused data corruption and crashes in the case that
a table had a non-composite (single-column) primary key, and that column
was specified to use
BITSHUFFLE encodings. If a
table with an affected schema was written in previous versions of Kudu,
the corruption will not be automatically repaired; users are encouraged
to re-insert such tables after upgrading to Kudu 1.2 or later.
Fixed a bug in the Spark
KuduRDD implementation which could cause
rows in the result set to be silently skipped in some cases.
KUDU-1551 Fixed an issue in which the tablet server would crash on restart in the case that it had previously crashed during the process of allocating a new WAL segment.
KUDU-1764 Fixed an issue where Kudu servers would leak approximately 16-32MB of disk space for every 10GB of data written to disk. After upgrading to Kudu 1.2 or later, any disk space leaked in previous versions will be automatically recovered on startup.
KUDU-1750 Fixed an issue where the API to drop a range partition would drop any partition with a matching lower or upper bound, rather than any partition with matching lower and upper bound.
Fixed an issue in the Java client where equality predicates which compared
an integer column to its maximum possible value (e.g.
would return incorrect results.
kudu-client Java artifact to properly shade classes in the
com.google.thirdparty namespace. The lack of proper shading in prior
releases could cause conflicts with certain versions of Google Guava.
Fixed shading issues in the
kudu-flume-sink Java artifact. The sink
now expects that Hadoop dependencies are provided by Flume, and properly
shades the Kudu client’s dependencies.
Fixed a few issues using the Python client library from Python 3.
Kudu 1.2.0 is wire-compatible with previous versions of Kudu:
Kudu 1.2 clients may connect to servers running Kudu 1.0. If the client uses features that are not available on the target server, an error will be returned.
Kudu 1.0 clients may connect to servers running Kudu 1.2 without limitations.
Rolling upgrade between Kudu 1.1 and Kudu 1.2 servers is believed to be possible though has not been sufficiently tested. Users are encouraged to shut down all nodes in the cluster, upgrade the software, and then restart the daemons on the new version.
The replication factor of tables is now limited to a maximum of 7. In addition, it is no longer allowed to create a table with an even replication factor.
GROUP_VARINT encoding is now deprecated. Kudu servers have never supported
this encoding, and now the client-side constant has been deprecated to match the
Kudu 1.2.0 introduces several new restrictions on schemas, cell size, and identifiers:
By default, Kudu will not permit the creation of tables with more than 300 columns. We recommend schema designs that use fewer columns for best performance.
No individual cell may be larger than 64KB. The cells making up a a composite key are limited to a total of 16KB after the internal composite-key encoding done by Kudu. Inserting rows not conforming to these limitations will result in errors being returned to the client.
Identifiers such as column and table names are now restricted to be valid UTF-8 strings. Additionally, a maximum length of 256 characters is enforced.
The Kudu 1.2 Java client is API- and ABI-compatible with Kudu 1.1. Applications written against Kudu 1.1 will compile and run against the Kudu 1.2 client and vice-versa.
The Kudu 1.2 C++ client is API- and ABI-forward-compatible with Kudu 1.1. Applications written and compiled against the Kudu 1.1 client will run without modification against the Kudu 1.2 client. Applications written and compiled against the Kudu 1.2 client will run without modification against the Kudu 1.1 client unless they use one of the following new APIs:
The Kudu 1.2 Python client is API-compatible with Kudu 1.1. Applications written against Kudu 1.1 will continue to run against the Kudu 1.2 client and vice-versa.
Please refer to the Known Issues and Limitations section of the documentation.