Databricks on AWS – Authentication, Authorization and Data Governance Architecture
This document describes the security architecture for Databricks deployed on AWS, covering identity management, authentication, authorization, and data governance using Unity Catalog.
1. Identity and Authentication Architecture
Authentication defines how users and services securely access the Databricks platform. The platform integrates with enterprise identity providers to enable centralized authentication and identity lifecycle management.
Authentication Components
- Enterprise Single Sign-On (SSO)
- Identity federation
- Multi-Factor Authentication (MFA)
- User and group synchronization
Technologies
- AWS Identity and Access Management
- Azure Active Directory / Okta (enterprise identity provider)
- SCIM provisioning
Implementation
- Users authenticate through enterprise SSO.
- Identity provider enforces MFA and password policies.
- User and group identities are synchronized into Databricks using SCIM.
2. Authorization Model
Authorization determines what authenticated users are allowed to access within the Databricks environment.
Access Control Principles
- Role-based access control
- Least privilege access
- Separation of duties
Authorization Layers
| Layer | Description |
|---|---|
| Workspace Access | Controls access to notebooks, clusters, jobs and workspace resources |
| Cluster Permissions | Defines who can create, attach or manage compute clusters |
| Job Permissions | Controls execution of scheduled pipelines |
| Data Access | Managed through Unity Catalog |
Example Implementation
- Data Engineers have cluster creation privileges.
- Data Analysts have read-only access to curated datasets.
- Administrators manage workspace configuration.
3. Unity Catalog Governance
Unity Catalog provides centralized data governance across Databricks workspaces, enabling consistent access control and auditing for data assets.
Unity Catalog Components
| Component | Description |
|---|---|
| Metastore | Central metadata store for all tables and data assets |
| Catalog | Top-level container for organizing data domains |
| Schema | Logical grouping of tables and views |
| Tables | Structured datasets stored in the data lake |
Data Hierarchy
| Level | Example |
|---|---|
| Catalog | Finance |
| Schema | Transactions |
| Table | Customer_Payments |
Benefits
- Centralized governance across multiple workspaces
- Fine-grained data access control
- Improved data lineage tracking
- Audit logging for compliance
4. Data Access Control with Unity Catalog
Unity Catalog enforces access control policies on tables, views, and other data assets using SQL-based permissions.
Access Levels
| Permission | Description |
|---|---|
| SELECT | Read access to tables |
| MODIFY | Insert, update or delete records |
| CREATE | Create new objects in schema |
| USAGE | Access to catalog or schema |
Example Policy
- Finance analysts receive SELECT permission on curated financial datasets.
- Data engineers receive MODIFY permissions on ingestion schemas.
5. Data Security
Encryption
- Encryption at rest using AWS KMS
- Encryption in transit using TLS
Storage
- Data stored in Amazon S3
- Delta Lake tables managed through Unity Catalog
Secrets Management
- Credentials stored in AWS Secrets Manager
- Access managed through secure scopes
6. Auditing and Monitoring
Auditing ensures visibility into user activities and data access events.
Monitoring Components
| Component | Purpose |
|---|---|
| Databricks Audit Logs | Tracks workspace and data activity |
| AWS CloudTrail | Tracks AWS API calls |
| Amazon CloudWatch | Infrastructure monitoring and alerts |
Audit Events
- User login activity
- Cluster creation and job execution
- Table access events
- Permission changes
7. Compliance and Governance
The platform supports enterprise governance requirements by providing strong data access controls, monitoring, and audit capabilities.
Compliance Controls
- Centralized identity management
- Fine-grained data access control
- Full audit logging
- Data classification and protection policies
Architecture Summary
| Area | Implementation |
|---|---|
| Authentication | Enterprise SSO with identity provider |
| Authorization | Role-based access control |
| Data Governance | Unity Catalog |
| Storage | Amazon S3 data lake |
| Security | KMS encryption and Secrets Manager |
| Monitoring | Audit logs and CloudWatch |
No comments:
Post a Comment