Wednesday, 25 June 2025

AWS Lake Formation Complete Guide

 

AWS Lake Formation Complete Guide


1. Introduction to AWS Lake Formation

  • Purpose: Securely build, manage, and govern data lakes on AWS.

  • Key features:

    • Centralized security and access control

    • Fine-grained permissions (database, table, column, row)

    • Data catalog integration with AWS Glue

    • Support for cross-account data sharing

    • Tag-based access control using LF-Tags


2. Core Concepts

TermDescription
Data lake locationS3 bucket/folder registered in Lake Formation
Data catalogMetadata about databases and tables stored in Glue Catalog
PermissionsAccess rights for principals (users/roles/accounts)
LF-TagsMetadata labels you assign to resources for policy control
Data filtersColumn-level or row-level restrictions
Resource linksShared resources (tables) from other accounts


3. Setup and Initial Steps

Step 1: Register S3 Data Lake Location

  • Go to Lake Formation → Data lake locations → Register location

  • Specify your S3 bucket/folder path and an IAM role with access

Step 2: Create Glue Database & Tables

  • Use Glue crawler or manually create Glue database/tables

  • Data files should be in registered S3 location


4. Managing Permissions

Lake Formation controls access at multiple levels:

LevelPermissions
DatabaseCREATE_TABLE, DROP, DESCRIBE
TableSELECT, ALTER, DROP, DESCRIBE
ColumnControlled by data filters or LF-Tags
RowControlled by row filters

5. Data Filters (Column and Row Level Security)

  • Data filters allow you to restrict access to specific columns or rows.

  • Create filters under Lake Formation → Data filters / Row filters.

  • Assign filters via Data permissions → Grant when assigning SELECT.

Example:

  • finance_columns_only: allows access to specific columns

  • region_us_only: allows access only to rows where region = 'US'


6. LF-Tags and Tag-Based Access Control

What are LF-Tags?

  • Key-value labels attached to databases, tables, or columns.

  • Example tags: Department=Finance, Environment=Prod

How to use:

  1. Create LF-Tags in Lake Formation → LF-Tags.

  2. Assign LF-Tags to tables or columns.

  3. Grant permissions to principals based on LF-Tags instead of individual resources.

Benefits:

  • Easier management at scale.

  • Dynamically control access by adding/removing LF-Tags.


7. Granting Permissions

Types of Principals:

  • IAM users or roles

  • AWS Accounts (for cross-account sharing)

How to grant:

  • Go to Lake Formation → Data permissions → Grant

  • Select Principal

  • Choose resources: Catalog resources or LF-Tags

  • Assign permissions: SELECT, DESCRIBE, etc.

  • Assign data filters if applicable

  • Grant


8. Cross-Account Sharing

  • Use Lake Formation's resource sharing to share databases or tables with other AWS accounts.

  • Steps in Producer Account:

    1. Grant permissions to the consumer account.

    2. Share tables via “Share” button or AWS RAM.

    3. Update S3 bucket policy for consumer access.

  • Steps in Consumer Account:

    1. Accept the shared resource.

    2. Create resource links.

    3. Grant permissions on resource links.

    4. Query shared tables.


9. Best Practices

  • Use LF-Tags for scalable permission management.

  • Always register S3 locations in Lake Formation.

  • Use IAM roles with least privilege.

  • Regularly audit Lake Formation permissions.

  • Combine row and column filters for fine-grained control.

  • Use AWS RAM for cross-account sharing.

  • Monitor access logs via CloudTrail.


10. Example: Create Column Filter & Grant Permissions

Create column filter:


aws lakeformation create-data-filter \ --name finance_columns_only \ --table {DatabaseName=company_db,Name=employees} \ --column-filter "filterExpression=IncludeColumns,columns=[id,name,department,region]"

Grant filter and table access:


aws lakeformation grant-permissions \ --principal DataLakePrincipalIdentifier=arn:aws:iam::<account>:user/user_finance \ --permissions SELECT \ --permissions-with-grant-option \ --resource '{ "Table": {"DatabaseName": "company_db", "Name": "employees", "CatalogId": "<account>"}}' \ --data-filter "Name=finance_columns_only"

11. Automating with Terraform (Simplified Snippet)


resource "aws_lakeformation_lf_tag" "department" { tag_key = "Department" tag_values = ["Finance", "Engineering"] } resource "aws_lakeformation_resource_lf_tag" "finance_employees" { resource_arn = aws_glue_catalog_table.employees.arn lf_tags { tag_key = aws_lakeformation_lf_tag.department.tag_key tag_values = ["Finance"] } } resource "aws_lakeformation_permissions" "finance_access" { principal = "arn:aws:iam::123456789012:user/finance_user" permissions = ["SELECT"] permissions_with_grant_option = [] resource { database { name = "company_db" } table { name = "employees" } } }

12. Troubleshooting Tips

IssueSolution
“Share” button missingEnsure data lake admin permission and registered S3 path
Data filter not visibleGrant ASSOCIATE permission on filter to user
User not showing in dropdownGrant DESCRIBE on catalog or paste full ARN
No data accessVerify S3 bucket policy allows Lake Formation and users access

13. Useful AWS Documentation

No comments:

Post a Comment