AWS Lake Formation Complete Guide
1. Introduction to AWS Lake Formation
-
Purpose: Securely build, manage, and govern data lakes on AWS.
-
Key features:
-
Centralized security and access control
-
Fine-grained permissions (database, table, column, row)
-
Data catalog integration with AWS Glue
-
Support for cross-account data sharing
-
Tag-based access control using LF-Tags
-
2. Core Concepts
Term | Description |
---|---|
Data lake location | S3 bucket/folder registered in Lake Formation |
Data catalog | Metadata about databases and tables stored in Glue Catalog |
Permissions | Access rights for principals (users/roles/accounts) |
LF-Tags | Metadata labels you assign to resources for policy control |
Data filters | Column-level or row-level restrictions |
Resource links | Shared resources (tables) from other accounts |
3. Setup and Initial Steps
Step 1: Register S3 Data Lake Location
-
Go to Lake Formation → Data lake locations → Register location
-
Specify your S3 bucket/folder path and an IAM role with access
Step 2: Create Glue Database & Tables
-
Use Glue crawler or manually create Glue database/tables
-
Data files should be in registered S3 location
4. Managing Permissions
Lake Formation controls access at multiple levels:
Level | Permissions |
---|---|
Database | CREATE_TABLE, DROP, DESCRIBE |
Table | SELECT, ALTER, DROP, DESCRIBE |
Column | Controlled by data filters or LF-Tags |
Row | Controlled by row filters |
5. Data Filters (Column and Row Level Security)
-
Data filters allow you to restrict access to specific columns or rows.
-
Create filters under Lake Formation → Data filters / Row filters.
-
Assign filters via Data permissions → Grant when assigning SELECT.
Example:
-
finance_columns_only
: allows access to specific columns -
region_us_only
: allows access only to rows where region = 'US'
6. LF-Tags and Tag-Based Access Control
What are LF-Tags?
-
Key-value labels attached to databases, tables, or columns.
-
Example tags:
Department=Finance
,Environment=Prod
How to use:
-
Create LF-Tags in Lake Formation → LF-Tags.
-
Assign LF-Tags to tables or columns.
-
Grant permissions to principals based on LF-Tags instead of individual resources.
Benefits:
-
Easier management at scale.
-
Dynamically control access by adding/removing LF-Tags.
7. Granting Permissions
Types of Principals:
-
IAM users or roles
-
AWS Accounts (for cross-account sharing)
How to grant:
-
Go to Lake Formation → Data permissions → Grant
-
Select Principal
-
Choose resources: Catalog resources or LF-Tags
-
Assign permissions: SELECT, DESCRIBE, etc.
-
Assign data filters if applicable
-
Grant
8. Cross-Account Sharing
-
Use Lake Formation's resource sharing to share databases or tables with other AWS accounts.
-
Steps in Producer Account:
-
Grant permissions to the consumer account.
-
Share tables via “Share” button or AWS RAM.
-
Update S3 bucket policy for consumer access.
-
-
Steps in Consumer Account:
-
Accept the shared resource.
-
Create resource links.
-
Grant permissions on resource links.
-
Query shared tables.
-
9. Best Practices
-
Use LF-Tags for scalable permission management.
-
Always register S3 locations in Lake Formation.
-
Use IAM roles with least privilege.
-
Regularly audit Lake Formation permissions.
-
Combine row and column filters for fine-grained control.
-
Use AWS RAM for cross-account sharing.
-
Monitor access logs via CloudTrail.
10. Example: Create Column Filter & Grant Permissions
Create column filter:
Grant filter and table access:
11. Automating with Terraform (Simplified Snippet)
12. Troubleshooting Tips
Issue | Solution |
---|---|
“Share” button missing | Ensure data lake admin permission and registered S3 path |
Data filter not visible | Grant ASSOCIATE permission on filter to user |
User not showing in dropdown | Grant DESCRIBE on catalog or paste full ARN |
No data access | Verify S3 bucket policy allows Lake Formation and users access |
13. Useful AWS Documentation
No comments:
Post a Comment