Apache Kudu Security

Kudu includes security features which allow Kudu clusters to be hardened against access from unauthorized users. This guide describes the security features provided by Kudu. Configuring a Secure Kudu Cluster lists essential configuration options when deploying a secure Kudu cluster. Known Limitations contains a list of known deficiencies in Kudu’s security capabilities.

Authentication

Kudu can be configured to enforce secure authentication among servers, and between clients and servers. Authentication prevents untrusted actors from gaining access to Kudu, and securely identifies the connecting user or services for authorization checks. Authentication in Kudu is designed to interoperate with other secure Hadoop components by utilizing Kerberos.

Authentication can be configured on Kudu servers using the --rpc_authentication flag, which can be set to required, optional, or disabled. By default, the flag is set to optional. When required, Kudu will reject connections from clients and servers who lack authentication credentials. When optional, Kudu will attempt to use strong authentication. When disabled or strong authentication fails for 'optional', by default Kudu will only allow unauthenticated connections from trusted subnets, which are private networks (127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16, 169.254.0.0/16) and local subnets of all local network interfaces. Unauthenticated connections from publicly routable IPs will be rejected.

The trusted subnets can be configured using the --trusted_subnets flag, which can be set to IP blocks in CIDR notation separated by comma. Set it to '0.0.0.0/0' to allow unauthenticated connections from all remote IP addresses. However, if network access is not otherwise restricted by a firewall, malicious users may be able to gain unauthorized access. This can be mitigated if authentication is configured to be required.

When the --rpc_authentication flag is set to optional, the cluster does not prevent access from unauthenticated users. To secure a cluster, use --rpc_authentication=required.

Internal PKI

Kudu uses an internal PKI system to issue X.509 certificates to servers in the cluster. Connections between peers who have both obtained certificates will use TLS for authentication, which doesn’t require contacting the Kerberos KDC. These certificates are only used for internal communication among Kudu servers, and between Kudu clients and servers. The certificates are never presented in a public facing protocol.

By using internally-issued certificates, Kudu offers strong authentication which scales to huge clusters, and allows TLS encryption to be used without requiring you to manually deploy certificates on every node.

Authentication Tokens

After authenticating to a secure cluster, the Kudu client will automatically request an authentication token from the Kudu master. An authentication token encapsulates the identity of the authenticated user and carries the master’s RSA signature so that its authenticity can be verified.

This token will be used to authenticate subsequent connections. By default, authentication tokens are only valid for seven days, so that even if a token were compromised, it could not be used indefinitely. For the most part, authentication tokens should be completely transparent to users. By using authentication tokens, Kudu takes advantage of strong authentication without paying the scalability cost of communicating with a central authority for every connection.

When used with distributed compute frameworks such as Spark, authentication tokens can simplify configuration and improve security. For example, the Kudu Spark connector will automatically retrieve an authentication token during the planning stage, and distribute the token to tasks. This allows Spark to work against a secured Kudu cluster where only the planner node has Kerberos credentials.

Client Authentication to Secure Kudu Clusters

Users running client Kudu applications must first run the kinit command to obtain a Kerberos ticket-granting ticket. For example:

$ kinit admin@EXAMPLE-REALM.COM

Once authenticated, you use the same client code to read from and write to Kudu servers with and without Kerberos configuration.

Scalability

Kudu authentication is designed to scale to thousands of nodes, which requires avoiding unnecessary coordination with a central authentication authority (such as the Kerberos KDC). Instead, Kudu servers and clients will use Kerberos to establish initial trust with the Kudu master, and then use alternate credentials for subsequent connections. In particular, the master will issue internal X.509 certificates to servers, and temporary authentication tokens to clients.

Coarse-Grained Authorization

Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal (i.e. user or service). The two levels of access which can be configured are:

Superuser - principals authorized as a superuser are able to perform certain administrative functionality such as using the kudu command line tool to diagnose or repair cluster issues.
User - principals authorized as a user are able to access and modify all data in the Kudu cluster. This includes the ability to create, drop, and alter tables as well as read, insert, update, and delete data.

Internally, Kudu has a third access level for the daemons themselves. This ensures that users cannot connect to the cluster and pose as tablet servers.

Access levels are granted using whitelist-style Access Control Lists (ACLs), one for each of the two levels. Each access control list either specifies a comma-separated list of users, or may be set to * to indicate that all authenticated users are able to gain access at the specified level. See Configuring a Secure Kudu Cluster below for examples.

The default value for the User ACL is *, which allows all users access to the cluster. However, if authentication is enabled, this still restricts access to only those users who are able to successfully authenticate via Kerberos. Unauthenticated users on the same network as the Kudu servers will be unable to access the cluster.

Fine-Grained Authorization

As of Kudu 1.10.0, Kudu can be configured to enforce fine-grained authorization across servers. This ensures that users can see only the data they are explicitly authorized to see. Kudu currently supports this by leveraging policies defined in Apache Sentry 2.2 and later.

Fine-grained authorization policies are not enforced when accessing the web UI. User data may appear on various pages of the web UI (e.g. in logs, metrics, scans, etc.). As such, it is recommended to either limit access to the web UI ports, or redact or disable the web UI entirely, as desired. See the instructions for securing the web UI for more details.

Apache Sentry

Apache Sentry models tabular objects in the following hierarchy:

Server - indicated by the Kudu configuration flag --server_name. Everything stored in a Kudu cluster falls within the given "server".
Database - indicated as a prefix of table names with the format <database>.<table>.
Table - a single Kudu table.
Column - a column within a Kudu table.

Each level of this hierarchy defines a "scope" on which privileges can be granted. Privileges granted on a higher scope imply privileges on a lower scope. For example, if a user has SELECT privilege on a database, that user implicitly has SELECT privileges on every table belonging to that database.

Privileges are also associated with specific actions. Access to Kudu tables may rely on privileges on the following actions:

ALTER
CREATE
DELETE
DROP
INSERT
UPDATE
SELECT

Additionally, there are three special actions recognized by Kudu: ALL, OWNER, and METADATA. If a user has the ALL or OWNER privileges on a given table, that user has all of the above privileges on the table. METADATA privilege is not an actual privilege per se, rather, it is a conceptual privilege with which Kudu models any privilege. If a user has any privilege on a given table, that user has METADATA privileges on the table, i.e. a privilege granted on any action on a table implies that the user has the METADATA privilege on that table.

For more details about Sentry privileges, see the Apache Sentry documentation.

Depending on the value of the sentry.db.explicit.grants.permitted configuration in Sentry, certain privileges may not be grantable in Sentry. For example, in Sentry deployments that don’t support UPDATE privileges, to perform an operation that requires UPDATE privileges, a user must instead have ALL privileges.

When a Kudu master receives a request, it consults Sentry to determine what privileges a user has. If the user is not authorized to perform the requested action, the request is rejected. Kudu leverages the authenticated identity of a user to decide whether to perform or reject a request.

Authorization Tokens

Rather than having every tablet server communicate directly with Sentry, privileges are propagated and checked via authorization tokens. These tokens encapsulate what privileges a user has on a given table. Tokens are generated by the master and returned to Kudu clients upon opening a Kudu table. Kudu clients automatically attach authorization tokens when sending requests to tablet servers.

Authorization tokens are a means to limiting the number of nodes directly accessing Sentry to retrieve privileges. As such, since the expected number of tablet servers in a cluster is much higher than the number of Kudu masters, they are only used to authorize requests sent to tablet servers. Kudu masters fetch privileges directly from Sentry or cache. See Caching for more details of Kudu’s privilege cache.

Similar to the validity interval for authentication tokens, to limit the window of potential unwanted access if a token becomes compromised, authorization tokens are valid for five minutes by default. The acquisition and renewal of a token is hidden from the user, as Kudu clients automatically retrieve new tokens when existing tokens expire.

When a tablet server that has been configured to enforce fine-grained access control receives a request, it checks the privileges in the attached token, rejecting it if the privileges are not sufficient to perform the requested operation, or if it is invalid (e.g. expired).

Trusted Users

It may be desirable to allow certain users to view and modify any data stored in Kudu. Such users can be specified via the --trusted_user_acl master configuration. Trusted users can perform any operation that would otherwise require fine-grained privileges, without Kudu consulting Sentry.

Additionally, some services that interact with Kudu may authorize requests on behalf of their end users. For example, Apache Impala authorizes queries on behalf of its users, and sends requests to Kudu as the Impala service user, commonly "impala". Since Impala authorizes requests on its own, to avoid extraneous communication between Sentry and Kudu, the Impala service user should be listed as a trusted user.

When accessing Kudu through Impala, Impala enforces its own fine-grained authorization policy. This policy is similar to Kudu’s and can be found in Impala’s authorization documentation.

Configuring the Integration with Apache Sentry

Sentry is often configured with Kerberos authentication. See Configuring a Secure Kudu Cluster for how to configure Kudu to authenticate via Kerberos.

In order to enable integration with Sentry, a cluster must first be integrated with the Apache Hive Metastore. See the documentation for how to configure Kudu to synchronize its internal catalog with the Hive Metastore.

The following configurations must be set on the master:

--sentry_service_rpc_addresses=<Sentry RPC address>
--server_name=<value of HiveServer2's hive.sentry.server configuration>
--kudu_service_name=kudu
--sentry_service_kerberos_principal=sentry
--sentry_service_security_mode=kerberos

# This example ACL setup allows the 'impala' user to access all data stored in
# Kudu, assuming Impala will authorize requests on its own. The 'hadoopadmin'
# user is also granted access to all Kudu data, which may facilitate testing
# and debugging.
--trusted_user_acl=impala,hadoopadmin

The following configurations must be set on the tablet servers:

--tserver_enforce_access_control=true

Caching

To avoid overwhelming Sentry with requests to fetch user privileges, the Kudu master can be configured to cache user privileges. A by-product of this caching is that when privileges are changed in Sentry, they may not be reflected in Kudu for a configurable amount of time, defined by the following Kudu master configurations:

--sentry_privileges_cache_ttl_factor * --authz_token_validity_interval_secs

The default value is fifty minutes. If privilege updates need to be reflected in Kudu sooner than this, the Kudu CLI tool can be used to invalidate the cached privileges to force Kudu to fetch new ones from Sentry:

kudu master authz_cache reset <master-addresses>

Policy for Kudu Masters

The following authorization policy is enforced by Kudu masters.

Table 1. Authorization Policy for Masters
Operation	Required Privilege
`CreateTable`	`CREATE ON DATABASE`
`CreateTable` with a different owner specified than the requesting user	`ALL ON DATABASE` with the Sentry `GRANT OPTION` (see here)
`DeleteTable`	`DROP ON TABLE`
`AlterTable` (with no rename)	`ALTER ON TABLE`
`AlterTable` (with rename)	`ALL ON TABLE <old-table>` and `CREATE ON DATABASE <new-database>`
`IsCreateTableDone`	`METADATA ON TABLE`
`IsAlterTableDone`	`METADATA ON TABLE`
`ListTables`	`METADATA ON TABLE`
`GetTableLocations`	`METADATA ON TABLE`
`GetTableSchema`	`METADATA ON TABLE`
`GetTabletLocations`	`METADATA ON TABLE`

Policy for Kudu Tablet Servers

The following authorization policy is enforced by Kudu tablet servers.

Table 2. Authorization Policy for Tablet Servers
Operation	Required Privilege
`Scan`	`SELECT ON TABLE`, or `METADATA ON TABLE` and `SELECT ON COLUMN` for each projected column and each predicate column
`Scan` (no projected columns, equivalent to `COUNT(*)`)	`SELECT ON TABLE`, or `SELECT ON COLUMN` for each column in the table
`Scan` (with virtual columns)	`SELECT ON TABLE`, or `SELECT ON COLUMN` for each column in the table
`Scan` (in `ORDERED` mode)	`<privileges required for a Scan>` and `SELECT ON COLUMN` for each primary key column
`Insert`	`INSERT ON TABLE`
`Update`	`UPDATE ON TABLE`
`Upsert`	`INSERT ON TABLE` and `UPDATE ON TABLE`
`Delete`	`DELETE ON TABLE`
`SplitKeyRange`	`SELECT ON COLUMN` for each primary key column and `SELECT ON COLUMN` for each projected column
`Checksum`	User must be configured in `--superuser_acl`
`ListTablets`	User must be configured in `--superuser_acl`

Unlike Impala, Kudu only supports all-or-nothing access to a table’s schema, rather than showing only authorized columns.

Encryption

Kudu allows all communications among servers and between clients and servers to be encrypted with TLS.

Encryption can be configured on Kudu servers using the --rpc_encryption flag, which can be set to required, optional, or disabled. By default, the flag is set to optional. When required, Kudu will reject unencrypted connections. When optional, Kudu will attempt to use encryption. Same as authentication, when disabled or encryption fails for optional, Kudu will only allow unencrypted connections from trusted subnets and reject any unencrypted connections from publicly routable IPs. To secure a cluster, use --rpc_encryption=required.

Kudu will automatically turn off encryption on local loopback connections, since traffic from these connections is never exposed externally. This allows locality-aware compute frameworks like Spark and Impala to avoid encryption overhead, while still ensuring data confidentiality.

Web UI Encryption

The Kudu web UI can be configured to use secure HTTPS encryption by providing each server with TLS certificates. See Configuring a Secure Kudu Cluster for more information on web UI HTTPS configuration.

Web UI Redaction

To prevent sensitive data from being exposed in the web UI, all row data is redacted. Table metadata, such as table names, column names, and partitioning information is not redacted. The web UI can be completely disabled by setting the --webserver_enabled=false flag on Kudu servers.

Disabling the web UI will also disable REST endpoints such as /metrics. Monitoring systems rely on these endpoints to gather metrics data.

Log Security

To prevent sensitive data from being included in Kudu server logs, all row data is redacted by default. By setting the --redact=log flag, redaction will be disabled in the web UI but retained for server logs. Alternatively, --redact=none can be used to disable redaction completely.

Configuring a Secure Kudu Cluster

The following configuration parameters should be set on all servers (master and tablet server) in order to ensure that a Kudu cluster is secure:

# Connection Security
#--------------------
--rpc_authentication=required
--rpc_encryption=required
--keytab_file=<path-to-kerberos-keytab>

# Web UI Security
#--------------------
--webserver_certificate_file=<path-to-cert-pem>
--webserver_private_key_file=<path-to-key-pem>
# optional
--webserver_private_key_password_cmd=<password-cmd>

# If you prefer to disable the web UI entirely:
--webserver_enabled=false

# Coarse-grained authorization
#--------------------------------

# This example ACL setup allows the 'impala' user as well as the
# 'nightly_etl_service_account' principal access to all data in the
# Kudu cluster. The 'hadoopadmin' user is allowed to use administrative
# tooling. Note that, by granting access to 'impala', other users
# may access data in Kudu via the Impala service subject to its own
# authorization rules.
--user_acl=impala,nightly_etl_service_account
--superuser_acl=hadoopadmin

See Configuring the Integration with Apache Sentry to see an example of how to enable fine-grained authorization via Apache Sentry.

Further information about these flags can be found in the configuration flag reference.

Known Limitations

Kudu has a few known security limitations:

Custom Kerberos Principal: Kudu does not support setting a custom service principal for Kudu processes. The principal must be 'kudu'.
External PKI: Kudu does not support externally-issued certificates for internal wire encryption (server to server and client to server).
On-disk Encryption: Kudu does not have built-in on-disk encryption. However, Kudu can be used with whole-disk encryption tools such as dm-crypt.