Lake Formation
Lake Formation (LF) is a service designed to facilitate data governance, data sharing, and the management of permissions for secure data lakes.
The Data Landing Zone (DLZ) offers the option to manage Lake Formation settings, tags, tag permissions, and data lake permissions. While utilizing Lake Formation within your DLZ is optional, it typically falls outside the scope of a landing zone’s responsibilities. Instead, this is usually handled by your Data Engineering or other workload teams. That said, having a centralized location for managing Lake Formation settings, tags, and permissions can be beneficial, particularly in multi-account environments.
The following snippet demonstrates how to specify the region
for the Lake Formation configuration, the admins
who are granted Lake Formation Admin permissions, the tags
array to define tags and their corresponding permissions,
and the permissions
array to set data lake permissions.
We recommend using Tag-Based Access Control (TBAC) for managing permissions within Lake Formation. By default, the
hybridMode
property is set to false
, which disables IAM permissions and relies solely on Lake Formation permissions.
LF-Tags and Permissions
The tags
array defines the tags created in Lake Formation. Each tag consists of a key and a list of possible values.
To manage permissions on these tags, you use the share
property, which allows you to share tags either within the account
or with external accounts. If you want to grant permissions on only a subset of tag values, use the specificValues
property. If this property is not provided, permissions will be granted to all values of the tag by default.
Sharing Tags Within an Account
The share.withinAccount
property allows you to share tags within the same account. You specify the IAM principals
(granted permissions on the tag) in the principals
property. These principals can be IAM roles, IAM users, SAML users,
or SAML groups.
By default, the CDK role is assigned as a Lake Formation admin. Normally, Lake Formation admins cannot manage tags created
by other admins. To address this, we explicitly grant all admins access to the tags created by the CDK DLZ construct. This
is equivalent to adding every admin under the share.withinAccount.principals
property with full access.
Sharing Tags with External Accounts
When sharing tags with external accounts, the share.withExternalAccount
property is used. The primary difference
between sharing tags internally (withinAccount
) and externally (withExternalAccount
) is that we restrict certain
actions externally. Specifically, the TagAction.ALTER
and TagAction.DROP
actions are not allowed for external sharing,
which ensures that external sharing remains read-only. This restriction prevents CloudFormation errors later in the process.
When sharing tags externally, the principals
property can include AWS Account IDs, Organizational Unit (OU) IDs,
Organization IDs, or specific IAM roles. If you provide Account, OU, or Organization IDs, all Lake Formation admins
in the external account will receive the specified permissions. These admins can then grant permissions to other roles,
if applicable.
Data Lake Permissions
Once you have defined the LF-Tags and their corresponding permissions, the next step is to associate these LF-Tags
with Data Catalog resources such as Databases and Tables. While the DLZ does not handle the task of adding LF-Tags
to Data Catalog resources, this responsibility lies with your Data Engineering or other workload teams. It does provide
the option to manage “Data permissions” by granting specific permissions (such as SELECT
) to principals (such as
IAM roles) on Data Catalog resources that are tagged with certain LF-Tags. We refer to these permissions as Grants
.
The following code snippet demonstrates how to specify permissions for principals on Data Catalog resources that match specific LF-Tags.
The code snippet above generates the following Grants, which are expressed as SQL statements for conceptual purposes. These statements are pseudocode and are used to illustrate the permissions granted. In practice, these will be translated into CloudFormation/CDK constructs.
The first permission expressed as two Grants:
The second permission expressed as four Grants:
Defining permissions requires a thorough understanding of how Lake Formation’s Tag-Based Access Control (TBAC) functions.
For more information on Tag-Based Access Control (TBAC) permissions and our recommended strategy, please refer to the Lake Formation TBAC strategy.