Note: This is a cross-post from the Cloudera Engineering Blog Fine-Grained Authorization with Apache Kudu and Impala
Apache Impala supports fine-grained authorization via Apache Sentry on all of the tables it manages including Apache Kudu tables. Given Impala is a very common way to access the data stored in Kudu, this capability allows users deploying Impala and Kudu to fully secure the Kudu data in multi-tenant clusters even though Kudu does not yet have native fine-grained authorization of its own. This solution works because Kudu natively supports coarse-grained (all or nothing) authorization which enables blocking all access to Kudu directly except for the impala user and an optional whitelist of other trusted users. This post will describe how to use Apache Impala’s fine-grained authorization support along with Apache Kudu’s coarse-grained authorization to achieve a secure multi-tenant deployment.
The examples in this post enable a workflow that uses Apache Spark to ingest data directly into
Kudu and Impala to run analytic queries on that data. The Spark job, run as the
is permitted to access the Kudu data via coarse-grained authorization. Even though this gives
access to all the data in Kudu, the
etl_service user is only used for scheduled jobs or by an
administrator. All queries on the data, from a wide array of users, will use Impala and leverage
Impala’s fine-grained authorization. Impala’s
allow you to flexibly control the privileges on the Kudu storage tables. Impala’s fine-grained
privileges along with support for
statements, allow you to finely control who can read and write data to your Kudu tables while
using Impala. Below is a diagram showing the workflow described:
Note: The examples below assume that Authorization has already been configured for Kudu, Impala, and Spark. For help configuring authorization see the Cloudera authorization documentation.
Configuring Kudu’s Coarse-Grained Authorization
Kudu supports coarse-grained authorization of client requests based on the authenticated client Kerberos principal. The two levels of access which can be configured are:
- Superuser – principals authorized as a superuser are able to perform certain administrative functionality such as using the kudu command line tool to diagnose or repair cluster issues.
- User – principals authorized as a user are able to access and modify all data in the Kudu cluster. This includes the ability to create, drop, and alter tables as well as read, insert, update, and delete data.
Access levels are granted using whitelist-style Access Control Lists (ACLs), one for each of the
two levels. Each access control list either specifies a comma-separated list of users, or may be
* to indicate that all authenticated users are able to gain access at the specified level.
Note: The default value for the User ACL is
*, which allows all users access to the cluster.
The first and most important step is to remove the default ACL of
* from Kudu’s
This will ensure only the users you list will have access to the Kudu cluster. Then, to allow the
Impala service to access all of the data in Kudu, the Impala service user, usually impala, should
be added to the Kudu
–user_acl configuration. Any user that is not using Impala will also need
to be added to this list. For example, an Apache Spark job might be used to load data directly
into Kudu. Generally, a single user is used to run scheduled jobs of applications that do not
support fine-grained authorization on their own. For this example, that user is
–user_acl configuration is:
For more details see the Kudu authorization documentation.
Using Impala’s Fine-Grained Authorization
to configure fine-grained authorization. Once configured, you can use Impala’s
to control the privileges of Kudu tables. These fine-grained privileges can be set at the database,
table and column level. Additionally you can individually control
Note: A user needs the
ALL privilege in order to run
statements against a Kudu table.
Below is a brief example with a couple tables stored in Kudu:
CREATE TABLE messages ( name STRING, time TIMESTAMP, message STRING, PRIMARY KEY(name, time) ) PARTITION BY HASH(name) PARTITIONS 4 STORED AS KUDU; GRANT ALL ON TABLE messages TO userA; CREATE TABLE metrics ( host STRING NOT NULL, metric STRING NOT NULL, time INT64 NOT NULL, value DOUBLE NOT NULL, PRIMARY KEY (host, metric, time) ) PARTITION BY HASH(name) PARTITIONS 4 STORED AS KUDU; GRANT ALL ON TABLE messages TO userB;
This brief example that combines Kudu’s coarse-grained authorization and Impala’s fine-grained authorization should enable you to meet the security needs of your data workflow today. The pattern described here can be applied to other services and workflows using Kudu as well. For greater authorization flexibility, you can look forward to the near future when Kudu supports native fine-grained authorization on its own. The Apache Kudu contributors understand the importance of native fine-grained authorization and they are working on integrations with Apache Sentry and Apache Ranger.