Apache Kudu - Apache Kudu Troubleshooting

OS	Command
Debian/Ubuntu	`sudo apt-get install ntp`
RHEL/CentOS	`sudo yum install ntp`

OS	Command
Debian/Ubuntu	`sudo service ntp restart`
RHEL/CentOS	`sudo /etc/init.d/ntpd restart`

Kudu Tracing

The kudu-master and kudu-tserver daemons include built-in tracing support based on the open source Chromium Tracing framework. You can use tracing to help diagnose latency issues or other problems on Kudu servers.

Accessing the tracing interface

The tracing interface is accessed via a web browser as part of the embedded web server in each of the Kudu daemons.

Table 1. Tracing Interface URLs
Daemon	URL
Tablet Server	http://tablet-server-1.example.com:8050/tracing.html
Master	http://master-1.example.com:8051/tracing.html

The tracing interface is known to work in recent versions of Google Chrome. Other browsers may not work as expected.

Collecting a trace

After navigating to the tracing interface, click the Record button on the top left corner of the screen. When beginning to diagnose a problem, start by selecting all categories. Click Record to begin recording a trace.

During the trace collection, events are collected into an in-memory ring buffer. This ring buffer is fixed in size, so it will eventually fill up to 100%. However, new events are still being collected while older events are being removed. While recording the trace, trigger the behavior or workload you are interested in exploring.

After collecting for several seconds, click Stop. The collected trace will be downloaded and displayed. Use the ? key to display help text about using the tracing interface to explore the trace.

Saving a trace

You can save collected traces as JSON files for later analysis by clicking Save after collecting the trace. To load and analyze a saved JSON file, click Load and choose the file.

RPC Timeout Traces

If client applications are experiencing RPC timeouts, the Kudu tablet server WARNING level logs should contain a log entry which includes an RPC-level trace. For example:

W0922 00:56:52.313848 10858 inbound_call.cc:193] Call kudu.consensus.ConsensusService.UpdateConsensus
from 192.168.1.102:43499 (request call id 3555909) took 1464ms (client timeout 1000).
W0922 00:56:52.314888 10858 inbound_call.cc:197] Trace:
0922 00:56:50.849505 (+     0us) service_pool.cc:97] Inserting onto call queue
0922 00:56:50.849527 (+    22us) service_pool.cc:158] Handling call
0922 00:56:50.849574 (+    47us) raft_consensus.cc:1008] Updating replica for 2 ops
0922 00:56:50.849628 (+    54us) raft_consensus.cc:1050] Early marking committed up to term: 8 index: 880241
0922 00:56:50.849968 (+   340us) raft_consensus.cc:1056] Triggering prepare for 2 ops
0922 00:56:50.850119 (+   151us) log.cc:420] Serialized 1555 byte log entry
0922 00:56:50.850213 (+    94us) raft_consensus.cc:1131] Marking committed up to term: 8 index: 880241
0922 00:56:50.850218 (+     5us) raft_consensus.cc:1148] Updating last received op as term: 8 index: 880243
0922 00:56:50.850219 (+     1us) raft_consensus.cc:1195] Filling consensus response to leader.
0922 00:56:50.850221 (+     2us) raft_consensus.cc:1169] Waiting on the replicates to finish logging
0922 00:56:52.313763 (+1463542us) raft_consensus.cc:1182] finished
0922 00:56:52.313764 (+     1us) raft_consensus.cc:1190] UpdateReplicas() finished
0922 00:56:52.313788 (+    24us) inbound_call.cc:114] Queueing success response

These traces can give an indication of which part of the request was slow. Please include them in bug reports related to RPC latency outliers.

Kernel Stack Watchdog Traces

Each Kudu server process has a background thread called the Stack Watchdog, which monitors the other threads in the server in case they have blocked for longer-than-expected periods of time. These traces can indicate operating system issues or bottlenecked storage.

When the watchdog thread identifies a case of thread blockage, it logs an entry in the WARNING log like the following:

W0921 23:51:54.306350 10912 kernel_stack_watchdog.cc:111] Thread 10937 stuck at /data/kudu/consensus/log.cc:505 for 537ms:
Kernel stack:
[<ffffffffa00b209d>] do_get_write_access+0x29d/0x520 [jbd2]
[<ffffffffa00b2471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
[<ffffffffa00fe6d8>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
[<ffffffffa00d9b23>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
[<ffffffffa00d9b9c>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
[<ffffffffa00d9e90>] ext4_dirty_inode+0x40/0x60 [ext4]
[<ffffffff811ac48b>] __mark_inode_dirty+0x3b/0x160
[<ffffffff8119c742>] file_update_time+0xf2/0x170
[<ffffffff8111c1e0>] __generic_file_aio_write+0x230/0x490
[<ffffffff8111c4c8>] generic_file_aio_write+0x88/0x100
[<ffffffffa00d3fb1>] ext4_file_write+0x61/0x1e0 [ext4]
[<ffffffff81180f5b>] do_sync_readv_writev+0xfb/0x140
[<ffffffff81181ee6>] do_readv_writev+0xd6/0x1f0
[<ffffffff81182046>] vfs_writev+0x46/0x60
[<ffffffff81182102>] sys_pwritev+0xa2/0xc0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

User stack:
    @       0x3a1ace10c4  (unknown)
    @          0x1262103  (unknown)
    @          0x12622d4  (unknown)
    @          0x12603df  (unknown)
    @           0x8e7bfb  (unknown)
    @           0x8f478b  (unknown)
    @           0x8f55db  (unknown)
    @          0x12a7b6f  (unknown)
    @       0x3a1b007851  (unknown)
    @       0x3a1ace894d  (unknown)
    @              (nil)  (unknown)

These traces can be useful for diagnosing root-cause latency issues when they are caused by systems below Kudu, such as disk controllers or file systems.

Memory Limits

Kudu has a hard and soft memory limit. The hard memory limit is the maximum amount a Kudu process is allowed to use, and is controlled by the --memory_limit_hard_bytes flag. The soft memory limit is a percentage of the hard memory limit, controlled by the flag memory_limit_soft_percentage and with a default value of 80%, that determines the amount of memory a process may use before it will start rejecting some write operations.

If the logs or RPC traces contain messages like

Service unavailable: Soft memory limit exceeded (at 96.35% of capacity)

then Kudu is rejecting writes due to memory backpressure. This may result in write timeouts. There are several ways to relieve the memory pressure on Kudu:

If the host has more memory available for Kudu, increase --memory_limit_hard_bytes.
Increase the rate at which Kudu can flush writes from memory to disk by increasing the number of disks or increasing the number of maintenance manager threads --maintenance_manager_num_threads. Generally, the recommended ratio of maintenance manager threads to data directories is 1:3.
Reduce the volume of writes flowing to Kudu on the application side.

Heap Sampling

For advanced debugging of memory usage, administrators may enable heap sampling on Kudu daemons. This allows Kudu developers to associate memory usage with the specific lines of code and data structures responsible. When reporting a bug related to memory usage or an apparent memory leak, heap profiling can give quantitative data to pinpoint the issue.

Heap sampling is an advanced troubleshooting technique and may cause performance degradation or instability of the Kudu service. Currently it is not recommended to enable this in a production environment unless specifically requested by the Kudu development team.

To enable heap sampling on a Kudu daemon, pass the flag --heap-sample-every-n-bytes=524588. If heap sampling is enabled, the current sampled heap occupancy can be retrieved over HTTP by visiting http://tablet-server.example.com:8050/pprof/heap or http://master.example.com:8051/pprof/heap. The output is a machine-readable dump of the stack traces with their associated heap usage.

Rather than visiting the heap profile page directly in a web browser, it is typically more useful to use the pprof tool that is distributed as part of the gperftools open source project. For example, a developer with a local build tree can use the following command to collect the sampled heap usage and output an SVG diagram:

thirdparty/installed/uninstrumented/bin/pprof -svg  'http://localhost:8051/pprof/heap' > /tmp/heap.svg

The resulting SVG may be visualized in a web browser or sent to the Kudu community to help troubleshoot memory occupancy issues.

Heap samples contain only summary information about allocations and do not contain any data from the heap. It is safe to share heap samples in public without fear of exposing confidential or sensitive data.

Disk Issues

When Kudu starts, it checks each configured data directory, expecting either for all to be initialized or for all to be empty. If a server fails to start with a log message like

Check failed: _s.ok() Bad status: Already present: FS layout already exists; not overwriting existing layout: FSManager roots already exist: /data0/kudu/data

then this precondition has failed. This could be because Kudu was configured with non-empty data directories on first startup, or because a previously-running, healthy Kudu process was restarted and at least one data directory was deleted or is somehow corrupted, perhaps because of a disk error. If in the latter situation, consult the Changing Directory Configurations documentation.

Apache Kudu Troubleshooting

Startup Errors

Errors During Hole Punching Test

NTP Clock Synchronization

Installing NTP

Monitoring NTP Status

NTP Configuration Best Practices

Troubleshooting NTP Stability Problems

Reporting Kudu Crashes

Performance Troubleshooting