$ kudu-tserver --dump_metrics_json
$ kudu-master --dump_metrics_json
These instructions are relevant only when Kudu is installed using operating system packages
(e.g. rpm or deb ).
|
Kudu tablet servers and masters expose useful operational information on a built-in web interface,
Kudu master processes serve their web interface on port 8051. The interface exposes several pages with information about the cluster state:
A list of tablet servers, their host names, and the time of their last heartbeat.
A list of tables, including schema and tablet location information for each.
SQL code which you can paste into Impala Shell to add an existing table to Impala’s list of known data sources.
Each tablet server serves a web interface on port 8050. The interface exposes information about each tablet hosted on the server, its current state, and debugging information about maintenance background operations.
Both Kudu masters and tablet servers expose a common set of information via their web interfaces:
HTTP access to server logs.
an /rpcz
endpoint which lists currently running RPCs via JSON.
pages giving an overview and detailed information on the memory usage of different components of the process.
information on the current set of configuration flags.
information on the currently running threads and their resource consumption.
a JSON endpoint exposing metrics about the server.
information on the deployed version number of the daemon.
These interfaces are linked from the landing page of each daemon’s web UI.
Kudu daemons expose a large number of metrics. Some metrics are associated with an entire server process, whereas others are associated with a particular tablet replica.
The full set of available metrics for a Kudu server can be dumped via a special command line flag:
$ kudu-tserver --dump_metrics_json
$ kudu-master --dump_metrics_json
This will output a large JSON document. Each metric indicates its name, label, description, units, and type. Because the output is JSON-formatted, this information can easily be parsed and fed into other tooling which collects metrics from Kudu servers.
Metrics can be collected from a server process via its HTTP interface by visiting
/metrics
. The output of this page is JSON for easy parsing by monitoring services.
This endpoint accepts several GET
parameters in its query string:
/metrics?metrics=<substring1>,<substring2>,…
- limits the returned metrics to those which contain
at least one of the provided substrings. The substrings also match entity names, so this
may be used to collect metrics for a specific tablet.
/metrics?include_schema=1
- includes metrics schema information such as unit, description,
and label in the JSON output. This information is typically elided to save space.
/metrics?compact=1
- eliminates unnecessary whitespace from the resulting JSON, which can decrease
bandwidth when fetching this page from a remote host.
/metrics?include_raw_histograms=1
- include the raw buckets and values for histogram metrics,
enabling accurate aggregation of percentile metrics over time and across hosts.
For example:
$ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted'
[
{
"type": "server",
"id": "kudu.tabletserver",
"attributes": {},
"metrics": [
{
"name": "rpc_connections_accepted",
"label": "RPC Connections Accepted",
"type": "counter",
"unit": "connections",
"description": "Number of incoming TCP connections made to the RPC server",
"value": 92
}
]
}
]
$ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency'
[
{
"type": "tablet",
"id": "c0ebf9fef1b847e2a83c7bd35c2056b1",
"attributes": {
"table_name": "lineitem",
"partition": "hash buckets: (55), range: [(<start>), (<end>))",
"table_id": ""
},
"metrics": [
{
"name": "log_append_latency",
"total_count": 7498,
"min": 4,
"mean": 69.3649,
"percentile_75": 29,
"percentile_95": 38,
"percentile_99": 45,
"percentile_99_9": 95,
"percentile_99_99": 167,
"max": 367244,
"total_sum": 520098
}
]
}
]
All histograms and counters are measured since the server start time, and are not reset upon collection. |
Kudu may be configured to dump various diagnostics information to a local log file.
The diagnostics log will be written to the same directory as the other Kudu log files, with a
similar naming format, substituting diagnostics
instead of a log level like INFO
.
After any diagnostics log file reaches 64MB uncompressed, the log will be rolled and
the previous file will be gzip-compressed.
Each line in the diagnostics log consists of the following components:
A human-readable timestamp formatted in the same fashion as the other Kudu log files.
The type of record. For example, a metrics record consists of the word metrics
.
A machine-readable timestamp, in microseconds since the Unix epoch.
The record itself.
Currently, the only type of diagnostics record is a periodic dump of the server metrics. Each record is encoded in compact JSON format, and the server attempts to elide any metrics which have not changed since the previous record. In addition, counters which have never been incremented are elided. Otherwise, the format of the JSON record is identical to the format exposed by the HTTP endpoint above.
The frequency with which metrics are dumped to the diagnostics log is configured using the
--metrics_log_interval_ms
flag. By default, Kudu logs metrics every 60 seconds.
As of version 1.9, Kudu supports a rack awareness feature. Kudu’s ordinary re-replication methods ensure the availability of the cluster in the event of a single node failure. However, clusters can be vulnerable to correlated failures of multiple nodes. For example, all of the physical hosts on the same rack in a datacenter may become unavailable simultaneously if the top-of-rack switch fails. Kudu’s rack awareness feature provides protection from some kinds of correlated failures, like the failure of a single rack in a datacenter.
The first element of Kudu’s rack awareness feature is location assignment. When
a tablet server or client registers with a master, the master assigns it a
location. A location is a /
-separated string that begins with a /
and where
each /
-separated component consists of characters from the set
[a-zA-Z0-9_-.]
. For example, /dc-0/rack-09
is a valid location, while
rack-04
and /rack=1
are not valid locations. Thus location strings resemble
absolute UNIX file paths where characters in directory and file names are
restricted to the set [a-zA-Z0-9_-.]
. Presently, Kudu does not use the
hierarchical structure of locations, but it may in the future. Location
assignment is done by a user-provided command, whose path should be specified
using the --location_mapping_cmd
master flag. The command should take a single
argument, the IP address or hostname of a tablet server or client, and return
the location for the tablet server or client. Make sure that all Kudu masters
are using the same location mapping command.
The second element of Kudu’s rack awareness feature is the placement policy, which is
Do not place a majority of replicas of a tablet on tablet servers in the same location.
The leader master, when placing newly created replicas on tablet servers and
when re-replicating existing tablets, will attempt to place the replicas in a
way that complies with the placement policy. For example, in a cluster with five
tablet servers A
, B
, C
, D
, and E
, with respective locations /L0
,
/L0
, /L1
, /L1
, /L2
, to comply with the placement policy a new 3x
replicated tablet could have its replicas placed on A
, C
, and E
, but not
on A
, B
, and C
, because then the tablet would have 2/3 replicas in
location /L0
. As another example, if a tablet has replicas on tablet servers
A
, C
, and E
, and then C
fails, the replacement replica must be placed on
D
in order to comply with the placement policy.
In the case where it is impossible to place replicas in a way that complies with
the placement policy, Kudu will violate the policy and place a replica anyway.
For example, using the setup described in the previous paragraph, if a tablet
has replicas on tablet servers A
, C
, and E
, and then E
fails, Kudu will
re-replicate the tablet onto one of B
or D
, violating the placement policy,
rather than leaving the tablet under-replicated indefinitely. The
kudu cluster rebalance
tool can reestablish the placement policy if it is
possible to do so. The kudu cluster rebalance
tool can also be used to
establish the placement policy on a cluster if the cluster has just been
configured to use the rack awareness feature and existing replicas need to be
moved to comply with the placement policy. See
running the tablet rebalancing tool on a rack-aware cluster
for more information.
The third and final element of Kudu’s rack awareness feature is the use of
client locations to find "nearby" servers. As mentioned, the masters also
assign a location to clients when they connect to the cluster. The client
(whether Java, C++, or Python) uses its own location and the locations of
tablet servers in the cluster to prefer "nearby" replicas when scanning in
CLOSEST_REPLICA
mode. Clients choose replicas to scan in the following order:
Scan a replica on a tablet server on the same host, if there is one.
Scan a replica on a tablet server in the same location, if there is one.
Scan some replica.
For example, using the cluster setup described above, if a client on the same
host as tablet server A
scans a tablet with replicas on tablet servers
A
, C
, and E
in CLOSEST_REPLICA
mode, it will choose to scan from the
replica on A
, since the client and the replica on A
are on the same host.
If the client scans a tablet with replicas on tablet servers B
, C
, and E
,
it will choose to scan from the replica on B
, since it is in the same
location as the client, /L0
. If there are multiple replicas meeting a
criterion, one is chosen arbitrarily.
As of Kudu 1.10.0, Kudu supports both full and incremental table backups via a job implemented using Apache Spark. Additionally it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark.
Given the Kudu backup and restore jobs use Apache Spark, ensure Apache Spark is installed in your environment following the Spark documentation. Additionally review the Apache Spark documentation for Submitting Applications.
To backup one or more Kudu tables the KuduBackup
Spark job can be used.
The first time the job is run for a table, a full backup will be run.
Additional runs will perform incremental backups which will only contain the
rows that have changed since the initial full backup. A new set of full
backups can be forced at anytime by passing the --forceFull
flag to the
backup job.
The common flags that will be used when taking a backup are:
--rootPath
: The root path to output backup data. Accepts any Spark-compatible path.
See Backup Directory Structure for the directory structure used in the rootPath
.
--kuduMasterAddresses
: Comma-separated addresses of Kudu masters. Default: localhost
<table>…
: A list of tables to be backed up.
Note: You can see the full list of Job options at anytime by passing the --help
flag.
Below is a full example of a KuduBackup
job execution which will backup the tables
foo
and bar
to the HDFS directory kudu-backups
:
spark-submit --class org.apache.kudu.backup.KuduBackup kudu-backup2_2.11-1.10.0.jar \
--kuduMasterAddresses master1-host,master-2-host,master-3-host \
--rootPath hdfs:///kudu-backups \
foo bar
To restore one or more Kudu tables, the KuduRestore
Spark job can be used.
For each backed up table, the KuduRestore
job will restore the full backup
and each associated incremental backup until the full table state is restored.
Restoring the full series of full and incremental backups is possible because
the backups are linked via the from_ms
and to_ms
fields in the backup metadata.
By default the restore job will create tables with the same name as the table
that was backed up. If you want to side-load the tables without affecting the
existing tables, you can pass --tableSuffix
to append a suffix to each
restored table.
The common flags that will be used when restoring are:
--rootPath
: The root path to the backup data. Accepts any Spark-compatible path.
See Backup Directory Structure for the directory structure used in the rootPath
.
--kuduMasterAddresses
: Comma-separated addresses of Kudu masters. Default: localhost
--tableSuffix
: If set, the suffix to add to the restored table names.
Only used when createTables is true.
--timestampMs
: A UNIX timestamp in milliseconds that defines the latest time
to use when selecting restore candidates. Default: System.currentTimeMillis()
<table>…
: A list of tables to be backed up.
Note: You can see the full list of job options at anytime by passing the --help
flag.
Below is a full example of a KuduRestore
job execution which will restore the tables
foo
and bar
from the HDFS directory kudu-backups
:
spark-submit --class org.apache.kudu.backup.KuduRestore kudu-backup2_2.11-1.10.0.jar \
--kuduMasterAddresses master1-host,master-2-host,master-3-host \
--rootPath hdfs:///kudu-backups \
foo bar
An additional backup-tools
jar is available to provide some backup exploration and
garbage collection capabilities. This jar does not use Spark directly, but instead
only requires the Hadoop classpath to run.
Commands:
list
: Lists the backups in the rootPath.
clean
: Cleans up old backup data in the rootPath.
Note: You can see the full list of command options at anytime by passing the --help
flag.
Below is an example execution which will print the command options:
java -cp $(hadoop classpath):kudu-backup-tools-1.10.0.jar org.apache.kudu.backup.KuduBackupCLI --help
The backup directory structure in the rootPath
is considered an internal detail
and could change in future versions of Kudu. Additionally the format and content
of the data and metadata files is meant for the backup and restore process only
and could change in future versions of Kudu. That said, understanding the structure
of the backup rootPath
and how it is used can be useful when working with Kudu backups.
The backup directory structure in the rootPath
is as follows:
/<rootPath>/<tableId>-<tableName>/<backup-id>/
.kudu-metadata.json
part-*.<format>
rootPath
: Can be used to distinguish separate backup groups, jobs, or concerns.
tableId
: The unique internal ID of the table being backed up.
tableName
: The name of the table being backed up.
Note: Table names are URL encoded to prevent pathing issues.
backup-id
: A way to uniquely identify/group the data for a single backup run.
.kudu-metadata.json
: Contains all of the metadata to support recreating the table,
linking backups by time, and handling data format changes.
Written last so that failed backups will not have a metadata file and will not be considered at restore time or backup linking time.
part-*.<format>
: The data files containing the tables data.
Currently 1 part file per Kudu partition.
Incremental backups contain an additional “RowAction” byte column at the end.
Currently the only supported format/suffix is parquet
To generate a list of tables to backup using the kudu table list
tool along
with grep
can be useful. Below is an example that will generate a list
of all tables that start with my_db.
:
kudu table list <master_addresses> | grep "^my_db\.*" | tr '\n' ' '
Note: This list could be saved as a part of you backup process to be used at restore time as well.
In general the Spark jobs were designed to run with minimal tuning and configuration. You can adjust the number of executors and resources to increase parallelism and performance using Spark’s configuration options.
If your tables are super wide and your default memory allocation is fairly low, you may see jobs fail. To resolve this increase the Spark executor memory. A conservative rule of thumb is 1 GiB per 50 columns.
If your Spark resources drastically outscale the Kudu cluster you may want to limit the number of concurrent tasks allowed to run on restore.
Kudu does not yet provide built-in physical backup and restore functionality. However, it is possible to create a physical backup of a Kudu node (either tablet server or master) and restore it later.
The node to be backed up must be offline during the procedure, or else the backed up (or restored) data will be inconsistent. |
Certain aspects of the Kudu node (such as its hostname) are embedded in the on-disk data. As such, it’s not yet possible to restore a physical backup of a node onto another machine. |
Stop all Kudu processes in the cluster. This prevents the tablets on the backed up node from being rereplicated elsewhere unnecessarily.
If creating a backup, make a copy of the WAL, metadata, and data directories on each node to be backed up. It is important that this copy preserve all file attributes as well as sparseness.
If restoring from a backup, delete the existing WAL, metadata, and data directories, then restore the backup via move or copy. As with creating a backup, it is important that the restore preserve all file attributes and sparseness.
Start all Kudu processes in the cluster.
For high availability and to avoid a single point of failure, Kudu clusters should be created with multiple masters. Many Kudu clusters were created with just a single master, either for simplicity or because Kudu multi-master support was still experimental at the time. This workflow demonstrates how to migrate to a multi-master configuration. It can also be used to migrate from two masters to three, with straightforward modifications. Note that the number of masters must be odd.
The workflow is unsafe for adding new masters to an existing configuration that already has three or more masters. Do not use it for that purpose. |
An even number of masters doesn’t provide any benefit over having one fewer masters. This guide should always be used for migrating to three masters. |
All of the command line steps below should be executed as the Kudu UNIX user. The example
commands assume the Kudu Unix user is kudu , which is typical.
|
The workflow presupposes at least basic familiarity with Kudu configuration management. If using vendor-specific tools the workflow also presupposes familiarity with it and the vendor’s instructions should be used instead as details may differ. |
Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
Decide how many masters to use. The number of masters should be odd. Three or five node master configurations are recommended; they can tolerate one or two failures respectively.
Perform the following preparatory steps for the existing master:
Identify and record the directories where the master’s write-ahead log (WAL) and data live. If
using Kudu system packages, their default locations are /var/lib/kudu/master, but they may be
customized via the fs_wal_dir
and fs_data_dirs
configuration parameters. The commands below
assume that fs_wal_dir
is /data/kudu/master/wal and fs_data_dirs
is /data/kudu/master/data.
Your configuration may differ. For more information on configuring these directories, see the
Kudu Configuration docs.
Identify and record the port the master is using for RPCs. The default port value is 7051, but it
may have been customized using the rpc_bind_addresses
configuration parameter.
Identify the master’s UUID. It can be fetched using the following command:
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
existing master’s previously recorded data directory
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null 4aab798a69e94fab8d77069edff28ce0
Optional: configure a DNS alias for the master. The alias could be a DNS cname (if the machine
already has an A record in DNS), an A record (if the machine is only known by its IP address),
or an alias in /etc/hosts. The alias should be an abstract representation of the master (e.g.
master-1
).
Without DNS aliases it is not possible to recover from permanent master failures without bringing the cluster down for maintenance, and as such, it is highly recommended. |
If you have Kudu tables that are accessed from Impala, you must update the master addresses in the Apache Hive Metastore (HMS) database.
If you set up the DNS aliases, run the following statement in impala-shell
,
replacing master-1
, master-2
, and master-3
with your actual aliases.
ALTER TABLE table_name
SET TBLPROPERTIES
('kudu.master_addresses' = 'master-1,master-2,master-3');
If you do not have DNS aliases set up, see Step #11 in the Performing the migration section for updating HMS.
Perform the following preparatory steps for each new master:
Choose an unused machine in the cluster. The master generates very little load so it can be colocated with other data services or load-generating processes, though not with another Kudu master from the same configuration.
Ensure Kudu is installed on the machine, either via system packages (in which case the kudu
and
kudu-master
packages should be installed), or via some other means.
Choose and record the directory where the master’s data will live.
Choose and record the port the master should use for RPCs.
Optional: configure a DNS alias for the master (e.g. master-2
, master-3
, etc).
Stop all the Kudu processes in the entire cluster.
Format the data directory on each new master machine, and record the generated UUID. Use the following command sequence:
$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>]
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
new master’s previously recorded data directory
$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data $ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null f5624e05f40649b79a757629a69d061e
If using CM, add the new Kudu master roles now, but do not start them.
If using DNS aliases, override the empty value of the Master Address
parameter for each role
(including the existing master role) with that master’s alias.
Add the port number (separated by a colon) if using a non-default RPC port value.
Rewrite the master’s Raft configuration with the following command, executed on the existing master machine:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <all_masters>
existing master’s previously recorded data directory
must be the string 00000000000000000000000000000000
space-separated list of masters, both new and existing. Each entry in the list must be
a string of the form <uuid>:<hostname>:<port>
master’s previously recorded UUID
master’s previously recorded hostname or alias
master’s previously recorded RPC port number
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:master-1:7051 f5624e05f40649b79a757629a69d061e:master-2:7051 988d8ac6530f426cbe180be5ba52033d:master-3:7051
Modify the value of the master_addresses
configuration parameter for both existing master and new masters.
The new value must be a comma-separated list of all of the masters. Each entry is a string of the form <hostname>:<port>
master’s previously recorded hostname or alias
master’s previously recorded RPC port number
Start the existing master.
Copy the master data to each new master with the following command, executed on each new master machine.
If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must authenticate as the Kudu service user prior to running this command. |
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <existing_master>
new master’s previously recorded data directory
must be the string 00000000000000000000000000000000
RPC address of the existing master and must be a string of the form
<hostname>:<port>
existing master’s previously recorded hostname or alias
existing master’s previously recorded RPC port number
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-1:7051
Start all of the new masters.
Skip the next step if using CM. |
Modify the value of the tserver_master_addrs
configuration parameter for each tablet server.
The new value must be a comma-separated list of masters where each entry is a string of the form
<hostname>:<port>
master’s previously recorded hostname or alias
master’s previously recorded RPC port number
Start all of the tablet servers.
If you have Kudu tables that are accessed from Impala and you didn’t set up DNS aliases, update the HMS database manually in the underlying database that provides the storage for HMS.
The following is an example SQL statement you should run in the HMS database:
UPDATE TABLE_PARAMS
SET PARAM_VALUE =
'master-1.example.com,master-2.example.com,master-3.example.com'
WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = 'old-master';
In impala-shell
, run:
INVALIDATE METADATA;
To verify that all masters are working properly, perform the following sanity checks:
Using a browser, visit each master’s web UI. Look at the /masters page. All of the masters should be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
Run a Kudu system check (ksck) on the cluster using the kudu
command line
tool. See Checking Cluster Health with ksck
for more details.
Kudu multi-master deployments function normally in the event of a master loss. However, it is important to replace the dead master; otherwise a second failure may lead to a loss of availability, depending on the number of available masters. This workflow describes how to replace the dead master.
Due to KUDU-1620, it is not possible to perform this workflow without also restarting the live masters. As such, the workflow requires a maintenance window, albeit a potentially brief one if the cluster was set up with DNS aliases.
Kudu does not yet support live Raft configuration changes for masters. As such, it is only possible to replace a master if the deployment was created with DNS aliases or if every node in the cluster is first shut down. See the multi-master migration workflow for more details on deploying with DNS aliases. |
The workflow presupposes at least basic familiarity with Kudu configuration management. If using vendor-specific tools the workflow also presupposes familiarity with it and the vendor’s instructions should be used instead as details may differ. |
All of the command line steps below should be executed as the Kudu UNIX user, typically
kudu .
|
If the deployment was configured without DNS aliases perform the following steps:
Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
Shut down all Kudu tablet server processes in the cluster.
Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from accidentally restarting; this can be quite dangerous for the cluster post-recovery.
Choose one of the remaining live masters to serve as a basis for recovery. The rest of this workflow will refer to this master as the "reference" master.
Choose an unused machine in the cluster where the new master will live. The master generates very little load so it can be colocated with other data services or load-generating processes, though not with another Kudu master from the same configuration. The rest of this workflow will refer to this master as the "replacement" master.
Perform the following preparatory steps for the replacement master:
Ensure Kudu is installed on the machine, either via system packages (in which case the kudu
and
kudu-master
packages should be installed), or via some other means.
Choose and record the directory where the master’s data will live.
Perform the following preparatory steps for each live master:
Identify and record the directory where the master’s data lives. If using Kudu system packages,
the default value is /var/lib/kudu/master, but it may be customized via the fs_wal_dir
and
fs_data_dirs
configuration parameters. Please note if you’ve set fs_data_dirs
to some directories
other than the value of fs_wal_dir
, it should be explicitly included in every command below where
fs_wal_dir
is also included. For more information on configuring these directories, see the
Kudu Configuration docs.
Identify and record the master’s UUID. It can be fetched using the following command:
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
live master’s previously recorded data directory
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c
Perform the following preparatory steps for the reference master:
Identify and record the directory where the master’s data lives. If using Kudu system packages,
the default value is /var/lib/kudu/master, but it may be customized via the fs_wal_dir
and
fs_data_dirs
configuration parameters. Please note if you’ve set fs_data_dirs
to some directories
other than the value of fs_wal_dir
, it should be explicitly included in every command below where
fs_wal_dir
is also included. For more information on configuring these directories, see the
Kudu Configuration docs.
Identify and record the UUIDs of every master in the cluster, using the following command:
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> 2>/dev/null
reference master’s previously recorded data directory
must be the string 00000000000000000000000000000000
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3
Using the two previously-recorded lists of UUIDs (one for all live masters and one for all masters), determine and record (by process of elimination) the UUID of the dead master.
Format the data directory on the replacement master machine using the previously recorded UUID of the dead master. Use the following command sequence:
$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid>
replacement master’s previously recorded data directory
dead master’s previously recorded UUID
$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c
Copy the master data to the replacement master with the following command:
If your Kudu cluster is secure, in addition to running as the Kudu UNIX user, you must authenticate as the Kudu service user prior to running this command. |
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master>
replacement master’s previously recorded data directory
must be the string 00000000000000000000000000000000
RPC address of the reference master and must be a string of the form
<hostname>:<port>
reference master’s previously recorded hostname or alias
reference master’s previously recorded RPC port number
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-2:7051
If using CM, add the replacement Kudu master role now, but do not start it.
Override the empty value of the Master Address
parameter for the new role with the replacement
master’s alias.
Add the port number (separated by a colon) if using a non-default RPC port value.
If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point at the replacement master.
If the cluster was set up without DNS aliases, perform the following steps:
Stop the remaining live masters.
Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 of Perform the Migration for more details.
Start the replacement master.
Restart the remaining masters in the new multi-master deployment. While the masters are shut down, there will be an availability outage, but it should last only as long as it takes for the masters to come back up.
Congratulations, the dead master has been replaced! To verify that all masters are working properly, consider performing the following sanity checks:
Using a browser, visit each master’s web UI. Look at the /masters page. All of the masters should be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
Run a Kudu system check (ksck) on the cluster using the kudu
command line
tool. See Checking Cluster Health with ksck
for more details.
In the event that a multi-master deployment has been overallocated nodes, the following steps should be taken to remove the unwanted masters.
In planning the new multi-master configuration, keep in mind that the number of masters should be odd and that three or five node master configurations are recommended. |
Dropping the number of masters below the number of masters currently needed for a Raft majority can incur data loss. To mitigate this, ensure that the leader master is not removed during this process. |
Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
Identify the UUID and RPC address current leader of the multi-master deployment by visiting the
/masters
page of any master’s web UI. This master must not be removed during this process; its
removal may result in severe data loss.
Stop all the Kudu processes in the entire cluster.
If using CM, remove the unwanted Kudu master.
Rewrite the Raft configuration on the remaining masters to include only the remaining masters. See Step 4 of Perform the Migration for more details.
Remove the data directories and WAL directory on the unwanted masters. This is a precaution to ensure that they cannot start up again and interfere with the new multi-master deployment.
Modify the value of the master_addresses
configuration parameter for the masters of the new
multi-master deployment. If migrating to a single-master deployment, the master_addresses
flag
should be omitted entirely.
Start all of the masters that were not removed.
Modify the value of the tserver_master_addrs
configuration parameter for the tablet servers to
remove any unwanted masters.
Start all of the tablet servers.
To verify that all masters are working properly, perform the following sanity checks:
Using a browser, visit each master’s web UI. Look at the /masters page. All of the masters should be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
Run a Kudu system check (ksck) on the cluster using the kudu
command line
tool. See Checking Cluster Health with ksck
for more details.
To prevent long maintenance windows when replacing dead masters, DNS aliases should be used. If the cluster was set up without aliases, changing the host names can be done by following the below steps.
Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
Note the UUID and RPC address of every master by visiting the /masters
page of any master’s web
UI.
Stop all the Kudu processes in the entire cluster.
Set up the new hostnames to point to the masters and verify all servers and clients properly resolve them.
Rewrite each master’s Raft configuration with the following command, executed on all master hosts:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 00000000000000000000000000000000 <all_masters>
For example:
$ sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 4aab798a69e94fab8d77069edff28ce0:new-master-name-1:7051 f5624e05f40649b79a757629a69d061e:new-master-name-2:7051 988d8ac6530f426cbe180be5ba52033d:new-master-name-3:7051
Change the masters' gflagfile so the master_addresses
parameter reflects the new hostnames.
Change the tserver_master_addrs
parameter in the tablet servers' gflagfiles to the new
hostnames.
Start up the masters.
To verify that all masters are working properly, perform the following sanity checks:
Using a browser, visit each master’s web UI. Look at the /masters page. All of the masters should be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.
Run the below command to verify all masters are up and listening. The UUIDs should be the same and belong to the same master as before the hostname change:
$ sudo -u kudu kudu master list new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051
Start all of the tablet servers.
Run a Kudu system check (ksck) on the cluster using the kudu
command line
tool. See Checking Cluster Health with ksck
for more details. After startup, some tablets may be
unavailable as it takes some time to initialize all of them.
If you have Kudu tables that are accessed from Impala, update the HMS database manually in the underlying database that provides the storage for HMS.
The following is an example SQL statement you should run in the HMS database:
UPDATE TABLE_PARAMS
SET PARAM_VALUE =
'new-master-name-1:7051,new-master-name-2:7051,new-master-name-3:7051'
WHERE PARAM_KEY = 'kudu.master_addresses'
AND PARAM_VALUE = 'master-1:7051,master-2:7051,master-3:7051';
In impala-shell
, run:
INVALIDATE METADATA;
Verify updating the metadata worked by running a simple SELECT
query on a
Kudu-backed Impala table.
A common workflow when administering a Kudu cluster is adding additional tablet server instances, in an effort to increase storage capacity, decrease load or utilization on individual hosts, increase compute power, etc.
By default, any newly added tablet servers will not be utilized immediately after their addition to the cluster. Instead, newly added tablet servers will only be utilized when new tablets are created or when existing tablets need to be replicated, which can lead to imbalanced nodes.
To add additional tablet servers to an existing cluster, the following steps can be taken to ensure tablets are uniformly distributed across the cluster:
Ensure that Kudu is installed on the new machines being added to the cluster, and that the new instances have been correctly configured to point to the pre-existing cluster. Then, start up the new tablet server instances.
Verify that the new instances check in with the Kudu Master(s)
successfully. A quick method for veryifying they’ve successfully checked in
with the existing Master instances is to view the Kudu Master WebUI,
specifically the /tablet-servers
section, and validate that the newly
added instances are registered, and heartbeating.
Once the tablet server(s) are successfully online and healthy, follow the steps to run the rebalancing tool which will spread existing tablets to the newly added tablet server nodes.
After the balancer has completed, or even during its execution, you can
check on the health of the cluster using the ksck
command-line utility.
ksck
The kudu
CLI includes a tool named ksck
that can be used for gathering
information about the state of a Kudu cluster, including checking its health.
ksck
will identify issues such as under-replicated tablets, unreachable
tablet servers, or tablets without a leader.
ksck
should be run from the command line as the Kudu admin user, and requires
the full list of master addresses to be specified:
$ sudo -u kudu kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com
To see a full list of the options available with ksck
, use the --help
flag.
If the cluster is healthy, ksck
will print information about the cluster, a
success message, and return a zero (success) exit status.
Master Summary UUID | Address | Status ----------------------------------+-----------------------+--------- a811c07b99394df799e6650e7310f282 | master-01.example.com | HEALTHY b579355eeeea446e998606bcb7e87844 | master-02.example.com | HEALTHY cfdcc8592711485fad32ec4eea4fbfcd | master-02.example.com | HEALTHY Tablet Server Summary UUID | Address | Status ----------------------------------+------------------------+--------- a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | HEALTHY Version Summary Version | Servers ---------+------------------------- 1.7.1 | all 6 server(s) checked Summary by table Name | RF | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable ----------+----+---------+---------------+---------+------------+------------------+------------- my_table | 3 | HEALTHY | 8 | 8 | 0 | 0 | 0 | Total Count ----------------+------------- Masters | 3 Tablet Servers | 3 Tables | 1 Tablets | 8 Replicas | 24 OK
If the cluster is unhealthy, for instance if a tablet server process has
stopped, ksck
will report the issue(s) and return a non-zero exit status, as
shown in the abbreviated snippet of ksck
output below:
Tablet Server Summary UUID | Address | Status ----------------------------------+------------------------+------------- a598f75345834133a39c6e51163245db | tserver-01.example.com | HEALTHY e05ca6b6573b4e1f9a518157c0c0c637 | tserver-02.example.com | HEALTHY e7e53a91fe704296b3a59ad304e7444a | tserver-03.example.com | UNAVAILABLE Error from 127.0.0.1:7150: Network error: could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7150: connect: Connection refused (error 61) (UNAVAILABLE) ... (full output elided) ================== Errors: ================== Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors Corruption: table consistency check error: 1 out of 1 table(s) are not healthy FAILED Runtime error: ksck discovered errors
To verify data integrity, the optional --checksum_scan
flag can be set, which
will ensure the cluster has consistent data by scanning each tablet replica and
comparing results. The --tables
or --tablets
flags can be used to limit the
scope of the checksum scan to specific tables or tablets, respectively. For
example, checking data integrity on the my_table
table can be done with the
following command:
$ sudo -u kudu kudu cluster ksck --checksum_scan --tables my_table master-01.example.com,master-02.example.com,master-03.example.com
By default, ksck
will attempt to use a snapshot scan of the table, so the
checksum scan can be done while writes continue.
Finally, ksck
also supports output in JSON format using the --ksck_format
flag. JSON output contains the same information as the plain text output, but
in a format that can be used by other tools. See kudu cluster ksck --help
for
more information.
For higher read parallelism and larger volumes of storage per server, users may want to configure servers to store data in multiple directories on different devices. Once a server is started, users must go through the following steps to change the directory configuration.
Users can add or remove data directories to an existing master or tablet server
via the kudu fs update_dirs
tool. Data is striped across data directories,
and when a new data directory is added, new data will be striped across the
union of the old and new directories.
Unless the --force flag is specified, Kudu will not allow for the
removal of a directory across which tablets are configured to spread data. If
--force is specified, all tablets configured to use that directory will fail
upon starting up and be replicated elsewhere.
|
If the metadata
directory overlaps with a data directory, as was the default prior to Kudu
1.7, or if a non-default metadata directory is configured, the
--fs_metadata_dir configuration must be specified when running the kudu fs
update_dirs tool.
|
Only new tablet replicas (i.e. brand new tablets' replicas and replicas that are copied to the server for high availability) will use the new directory. Existing tablet replicas on the server will not be rebalanced across the new directory. |
All of the command line steps below should be executed as the Kudu
UNIX user, typically kudu .
|
The tool can only run while the server is offline, so establish a maintenance
window to update the server. The tool itself runs quickly, so this offline
window should be brief, and as such, only the server to update needs to be
offline. However, if the server is offline for too long (see the
follower_unavailable_considered_failed_sec
flag), the tablet replicas on it
may be evicted from their Raft groups. To avoid this, it may be desirable to
bring the entire cluster offline while performing the update.
Run the tool with the desired directory configuration flags. For example, if a
cluster was set up with --fs_wal_dir=/wals
, --fs_metadata_dir=/meta
, and
--fs_data_dirs=/data/1,/data/2,/data/3
, and /data/3
is to be removed (e.g.
due to a disk error), run the command:
$ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_metadata_dir=/meta --fs_data_dirs=/data/1,/data/2
Modify the values of the fs_data_dirs
flags for the updated sever. If using
CM, make sure to only update the configurations of the updated server, rather
than of the entire Kudu service.
Once complete, the server process can be started. When Kudu is installed using
system packages, service
is typically used:
$ sudo service kudu-tserver start
Kudu nodes can only survive failures of disks on which certain Kudu directories are mounted. For more information about the different Kudu directory types, see the section on Kudu Directory Configurations. Below describes this behavior across different Apache Kudu releases.
Node Type | Kudu Directory Type | Kudu Releases that Crash on Disk Failure |
---|---|---|
Master |
All |
All |
Tablet Server |
Directory containing WALs |
All |
Tablet Server |
Directory containing tablet metadata |
All |
Tablet Server |
Directory containing data blocks only |
Pre-1.6.0 |
When a disk failure occurs that does not lead to a crash, Kudu will stop using the affected directory, shut down tablets with blocks on the affected directories, and automatically re-replicate the affected tablets to other tablet servers. The affected server will remain alive and print messages to the log indicating the disk failure, for example:
E1205 19:06:24.163748 27115 data_dirs.cc:1011] Directory /data/8/kudu/data marked as failed E1205 19:06:30.324795 27064 log_block_manager.cc:1822] Not using report from /data/8/kudu/data: IO error: Could not open container 0a6283cab82d4e75848f49772d2638fe: /data/8/kudu/data/0a6283cab82d4e75848f49772d2638fe.metadata: Read-only file system (error 30) E1205 19:06:33.564638 27220 ts_tablet_manager.cc:946] T 4957808439314e0d97795c1394348d80 P 70f7ee61ead54b1885d819f354eb3405: aborting tablet bootstrap: tablet has data in a failed directory
While in this state, the affected node will avoid using the failed disk, leading to lower storage volume and reduced read parallelism. The administrator should schedule a brief window to update the node’s directory configuration to exclude the failed disk.
When the disk is repaired, remounted, and ready to be reused by Kudu, take the following steps:
Make sure that the Kudu portion of the disk is completely empty.
Stop the tablet server.
Run the update_dirs
tool. For example, to add /data/3
, run the following:
$ sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=/wals --fs_data_dirs=/data/1,/data/2,/data/3
Start the tablet server.
Run ksck
to verify cluster health.
sudo -u kudu kudu cluster ksck master-01.example.com
Note that existing tablets will not stripe to the restored disk, but any new tablets will stripe to the restored disk.
By default, Kudu reserves a small amount of space (1% by capacity) in its directories; Kudu considers a disk full if there is less free space available than the reservation. Kudu nodes can only tolerate running out of space on disks on which certain Kudu directories are mounted. For more information about the different Kudu directory types, see Kudu Directory Configurations. The table below describes this behavior for each type of directory. The behavior is uniform across masters and tablet servers.
Kudu Directory Type | Crash on a Full Disk? |
---|---|
Directory containing WALs |
Yes |
Directory containing tablet metadata |
Yes |
Directory containing data blocks only |
No (see below) |
Prior to Kudu 1.7.0, Kudu stripes tablet data across all directories, and will avoid writing data to full directories. Kudu will crash if all data directories are full.
In 1.7.0 and later, new tablets are assigned a disk group consisting of -fs_target_data_dirs_per_tablet data dirs (default 3). If Kudu is not configured with enough data directories for a full disk group, all data directories are used. When a data directory is full, Kudu will stop writing new data to it and each tablet that uses that data directory will write new data to other data directories within its group. If all data directories for a tablet are full, Kudu will crash. Periodically, Kudu will check if full data directories are still full, and will resume writing to those data directories if space has become available.
If Kudu does crash because its data directories are full, freeing space on the full directories will allow the affected daemon to restart and resume writing. Note that it may be possible for Kudu to free some space by running
$ sudo -u kudu kudu fs check --repair
but this command may also fail if there is too little space left.
It’s also possible to allocate additional data directories to Kudu in order to increase the overall amount of storage available. See the documentation on updating a node’s directory configuration for more information. Note that existing tablets will not use new data directories, so adding a new data directory does not resolve issues with full disks.
If a tablet has permanently lost a majority of its replicas, it cannot recover automatically and operator intervention is required. If the tablet servers hosting a majority of the replicas are down (i.e. ones reported as "TS unavailable" by ksck), they should be recovered instead if possible.
The steps below may cause recent edits to the tablet to be lost, potentially resulting in permanent data loss. Only attempt the procedure below if it is impossible to bring a majority back online. |
Suppose a tablet has lost a majority of its replicas. The first step in diagnosing and fixing the problem is to examine the tablet’s state using ksck:
$ sudo -u kudu kudu cluster ksck --tablets=e822cab6c0584bc0858219d1539a17e6 master-00,master-01,master-02
Connected to the Master
Fetched info from all 5 Tablet Servers
Tablet e822cab6c0584bc0858219d1539a17e6 of table 'my_table' is unavailable: 2 replica(s) not RUNNING
638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING
9a56fa85a38a4edc99c6229cba68aeaa (tserver-01:7150): bad state
State: FAILED
Data state: TABLET_DATA_READY
Last status: <failure message>
c311fef7708a4cf9bb11a3e4cbcaab8c (tserver-02:7150): bad state
State: FAILED
Data state: TABLET_DATA_READY
Last status: <failure message>
This output shows that, for tablet e822cab6c0584bc0858219d1539a17e6
, the two
tablet replicas on tserver-01
and tserver-02
failed. The remaining replica
is not the leader, so the leader replica failed as well. This means the chance
of data loss is higher since the remaining replica on tserver-00
may have
been lagging. In general, to accept the potential data loss and restore the
tablet from the remaining replicas, divide the tablet replicas into two groups:
Healthy replicas: Those in RUNNING
state as reported by ksck
Unhealthy replicas
For example, in the above ksck output, the replica on tablet server tserver-00
is healthy, while the replicas on tserver-01
and tserver-02
are unhealthy.
On each tablet server with a healthy replica, alter the consensus configuration
to remove unhealthy replicas. In the typical case of 1 out of 3 surviving
replicas, there will be only one healthy replica, so the consensus configuration
will be rewritten to include only the healthy replica.
$ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid>
where <tablet-id>
is e822cab6c0584bc0858219d1539a17e6
and
<tserver-00-uuid>
is the uuid of tserver-00
,
638a20403e3e4ae3b55d4d07d920e6de
.
Once the healthy replicas' consensus configurations have been forced to exclude the unhealthy replicas, the healthy replicas will be able to elect a leader. The tablet will become available for writes, though it will still be under-replicated. Shortly after the tablet becomes available, the leader master will notice that it is under-replicated, and will cause the tablet to re-replicate until the proper replication factor is restored. The unhealthy replicas will be tombstoned by the master, causing their remaining data to be deleted.
In the event that critical files are lost, i.e. WALs or tablet-specific metadata, all Kudu directories on the server must be deleted and rebuilt to ensure correctness. Doing so will destroy the copy of the data for each tablet replica hosted on the local server. Kudu will automatically re-replicate tablet replicas removed in this way, provided the replication factor is at least three and all other servers are online and healthy.
These steps use a tablet server as an example, but the steps are the same for Kudu master servers. |
If multiple nodes need their FS layouts rebuilt, wait until all replicas previously hosted on each node have finished automatically re-replicating elsewhere before continuing. Failure to do so can result in permanent data loss. |
Before proceeding, ensure the contents of the directories are backed up, either as a copy or in the form of other tablet replicas. |
The first step to rebuilding a server with a new directory configuration is
emptying all of the server’s existing directories. For example, if a tablet
server is configured with --fs_wal_dir=/data/0/kudu-tserver-wal
,
--fs_metadata_dir=/data/0/kudu-tserver-meta
, and
--fs_data_dirs=/data/1/kudu-tserver,/data/2/kudu-tserver
, the following
commands will remove the WAL directory’s and data directories' contents:
# Note: this will delete all of the data from the local tablet server.
$ rm -rf /data/0/kudu-tserver-wal/* /data/0/kudu-tserver-meta/* /data/1/kudu-tserver/* /data/2/kudu-tserver/*
If using CM, update the configurations for the rebuilt server to include only the desired directories. Make sure to only update the configurations of servers to which changes were applied, rather than of the entire Kudu service.
After directories are deleted, the server process can be started with the new directory configuration. The appropriate sub-directories will be created by Kudu upon starting up.
If a single tablet server is brought down temporarily in a healthy cluster, all
tablets will remain available and clients will function as normal, after
potential short delays due to leader elections. However, if the downtime lasts
for more than --follower_unavailable_considered_failed_sec
(default 300)
seconds, the tablet replicas on the down tablet server will be replaced by new
replicas on available tablet servers. This will cause stress on the cluster
as tablets re-replicate and, if the downtime lasts long enough, significant
reduction in the number of replicas on the down tablet server. This may require
the rebalancer to fix.
To work around this, increase --follower_unavailable_considered_failed_sec
on
all tablet servers so the amount of time before re-replication will start is
longer than the expected downtime of the tablet server, including the time it
takes the tablet server to restart and bootstrap its tablet replicas. To do
this, run the following command for each tablet server:
$ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <num_seconds>
where <num_seconds>
is the number of seconds that will encompass the downtime.
Once the downtime is finished, reset the flag to its original value.
$ sudo -u kudu kudu tserver set_flag <tserver_address> follower_unavailable_considered_failed_sec <original_value>
Be sure to reset the value of --follower_unavailable_considered_failed_sec
to its original value.
|
On Kudu versions prior to 1.8, the --force flag must be provided in the above
commands.
|
The kudu
CLI contains a rebalancing tool that can be used to rebalance
tablet replicas among tablet servers. For each table, the tool attempts to
balance the number of replicas per tablet server. It will also, without
unbalancing any table, attempt to even out the number of replicas per tablet
server across the cluster as a whole. The rebalancing tool should be run as the
Kudu admin user, specifying all master addresses:
sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com
When run, the rebalancer will report on the initial tablet replica distribution in the cluster, log the replicas it moves, and print a final summary of the distribution when it terminates:
Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 0 Maximum Replica Count | 24 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 8 Maximum | 8 Average | 8.000000 I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled ... (full output elided) I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK rebalancing is complete: cluster is balanced (moved 28 replicas) Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 14 Maximum Replica Count | 15 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 1 Maximum | 1 Average | 1.000000
If more details are needed in addition to the replica distribution summary,
use the --output_replica_distribution_details
flag. If added, the flag makes
the tool print per-table and per-tablet server replica distribution statistics
as well.
Use the --report_only
flag to get a report on table- and cluster-wide
replica distribution statistics without starting any rebalancing activity.
The rebalancer can also be restricted to run on a subset of the tables by
supplying the --tables
flag. Note that, when running on a subset of tables,
the tool will not attempt to balance the cluster as a whole.
The length of time rebalancing is run for can be controlled with the flag
--max_run_time_sec
. By default, the rebalancer will run until the cluster is
balanced. To control the amount of resources devoted to rebalancing, modify
the flag --max_moves_per_server
. See kudu cluster rebalance --help
for more.
It’s safe to stop the rebalancer tool at any time. When restarted, the rebalancer will continue rebalancing the cluster.
The rebalancer requires all registered tablet servers to be up and running to proceed with the rebalancing process. That’s to avoid possible conflicts and races with the automatic re-replication and keep replica placement optimal for current configuration of the cluster. If a tablet server becomes unavailable during the rebalancing session, the rebalancer will exit. As noted above, it’s safe to restart the rebalancer after resolving the issue with unavailable tablet servers.
The rebalancing tool can rebalance Kudu clusters running older versions as well, with some restrictions. Consult the following table for more information. In the table, "RF" stands for "replication factor".
Version Range | Rebalances RF = 1 Tables? | Rebalances RF > 1 Tables? |
---|---|---|
v < 1.4.0 |
No |
No |
1.4.0 <= v < 1.7.1 |
No |
Yes |
v >= 1.7.1 |
Yes |
Yes |
If the rebalancer is running against a cluster where rebalancing replication factor one tables is not supported, it will rebalance all the other tables and the cluster as if those singly-replicated tables did not exist.
As detailed in the rack awareness section, it’s possible
to use the kudu cluster rebalance
tool to establish the placement policy on a
cluster. This might be necessary when the rack awareness feature is first
configured or when re-replication violated the placement policy. The rebalancing
tool breaks its work into three phases:
The rack-aware rebalancer tries to establish the placement policy. Use the
--disable_policy_fixer
flag to skip this phase.
The rebalancer tries to balance load by location, moving tablet replicas
between locations in an attempt to spread tablet replicas among locations
evenly. The load of a location is measured as the total number of replicas in
the location divided by the number of tablet servers in the location. Use the
--disable_cross_location_rebalancing
flag to skip this phase.
The rebalancer tries to balance the tablet replica distribution within each
location, as if the location were a cluster on its own. Use the
--disable_intra_location_rebalancing
flag to skip this phase.
By using the --report_only
flag, it’s also possible to check if all tablets in
the cluster conform to the placement policy without attempting any replica
movement.
Kudu does not currently have an automated way to remove a tablet server from a cluster permanently. Instead, use the following steps:
Ensure the cluster is in good health using ksck
. See Checking Cluster Health with ksck
.
If the tablet server contains any replicas of tables with replication factor
1, these replicas must be manually moved off the tablet server prior to
shutting it down. The kudu tablet change_config move_replica
tool can be
used for this.
Shut down the tablet server. After
-follower_unavailable_considered_failed_sec
, which defaults to 5 minutes,
Kudu will begin to re-replicate the tablet server’s replicas to other servers.
Wait until the process is finished. Progress can be monitored using ksck
.
Once all the copies are complete, ksck
will continue to report the tablet
server as unavailable. The cluster will otherwise operate fine without the
tablet server. To completely remove it from the cluster so ksck
shows the
cluster as completely healthy, restart the masters. In the case of a single
master, this will cause cluster downtime. With multimaster, restart the
masters in sequence to avoid cluster downtime.
Do not shut down multiple tablet servers at once. To remove multiple
tablet servers from the cluster, follow the above instructions for each tablet
server, ensuring that the previous tablet server is removed from the cluster and
ksck is healthy before shutting down the next.
|
kudu
command line toolWhen using the kudu
command line tool, it can be difficult to remember the
precise list of Kudu master RPC addresses needed to communicate with a cluster,
especially when managing multiple clusters. As an alternative, the command line
tool can identify clusters by name. To use this functionality:
Create a new directory to store the Kudu configuration file.
Export the path to this directory in the KUDU_CONFIG
environment variable.
Create a file called kudurc
in the new directory.
Populate kudurc
as follows, substituting your own cluster names and RPC
addresses:
clusters_info: cluster_name1: master_addresses: ip1:port1,ip2:port2,ip3:port3 cluster_name2: master_addresses: ip4:port4
When using the kudu
command line tool, replace the list of Kudu master RPC
addresses with the cluster name, prepended with the character @
.
$ sudo -u kudu kudu ksck @cluster_name1
Cluster names may be used as input in any invocation of the kudu command
line tool that expects a list of Kudu master RPC addresses.
|
ksck
kudu
command line tool