Tuesday, 6 January 2026

Databricks on AWS - Security and Governance Architecture

Databricks on AWS - Security and Governance Architecture

Databricks on AWS – Authentication, Authorization and Data Governance Architecture

This document describes the security architecture for Databricks deployed on AWS, covering identity management, authentication, authorization, and data governance using Unity Catalog.

1. Identity and Authentication Architecture

Authentication defines how users and services securely access the Databricks platform. The platform integrates with enterprise identity providers to enable centralized authentication and identity lifecycle management.

Authentication Components

  • Enterprise Single Sign-On (SSO)
  • Identity federation
  • Multi-Factor Authentication (MFA)
  • User and group synchronization

Technologies

  • AWS Identity and Access Management
  • Azure Active Directory / Okta (enterprise identity provider)
  • SCIM provisioning

Implementation

  • Users authenticate through enterprise SSO.
  • Identity provider enforces MFA and password policies.
  • User and group identities are synchronized into Databricks using SCIM.

2. Authorization Model

Authorization determines what authenticated users are allowed to access within the Databricks environment.

Access Control Principles

  • Role-based access control
  • Least privilege access
  • Separation of duties

Authorization Layers

Layer Description
Workspace Access Controls access to notebooks, clusters, jobs and workspace resources
Cluster Permissions Defines who can create, attach or manage compute clusters
Job Permissions Controls execution of scheduled pipelines
Data Access Managed through Unity Catalog

Example Implementation

  • Data Engineers have cluster creation privileges.
  • Data Analysts have read-only access to curated datasets.
  • Administrators manage workspace configuration.

3. Unity Catalog Governance

Unity Catalog provides centralized data governance across Databricks workspaces, enabling consistent access control and auditing for data assets.

Unity Catalog Components

Component Description
Metastore Central metadata store for all tables and data assets
Catalog Top-level container for organizing data domains
Schema Logical grouping of tables and views
Tables Structured datasets stored in the data lake

Data Hierarchy

Level Example
Catalog Finance
Schema Transactions
Table Customer_Payments

Benefits

  • Centralized governance across multiple workspaces
  • Fine-grained data access control
  • Improved data lineage tracking
  • Audit logging for compliance

4. Data Access Control with Unity Catalog

Unity Catalog enforces access control policies on tables, views, and other data assets using SQL-based permissions.

Access Levels

Permission Description
SELECT Read access to tables
MODIFY Insert, update or delete records
CREATE Create new objects in schema
USAGE Access to catalog or schema

Example Policy

  • Finance analysts receive SELECT permission on curated financial datasets.
  • Data engineers receive MODIFY permissions on ingestion schemas.

5. Data Security

Encryption

  • Encryption at rest using AWS KMS
  • Encryption in transit using TLS

Storage

  • Data stored in Amazon S3
  • Delta Lake tables managed through Unity Catalog

Secrets Management

  • Credentials stored in AWS Secrets Manager
  • Access managed through secure scopes

6. Auditing and Monitoring

Auditing ensures visibility into user activities and data access events.

Monitoring Components

Component Purpose
Databricks Audit Logs Tracks workspace and data activity
AWS CloudTrail Tracks AWS API calls
Amazon CloudWatch Infrastructure monitoring and alerts

Audit Events

  • User login activity
  • Cluster creation and job execution
  • Table access events
  • Permission changes

7. Compliance and Governance

The platform supports enterprise governance requirements by providing strong data access controls, monitoring, and audit capabilities.

Compliance Controls

  • Centralized identity management
  • Fine-grained data access control
  • Full audit logging
  • Data classification and protection policies

Architecture Summary

Area Implementation
Authentication Enterprise SSO with identity provider
Authorization Role-based access control
Data Governance Unity Catalog
Storage Amazon S3 data lake
Security KMS encryption and Secrets Manager
Monitoring Audit logs and CloudWatch

No comments:

Post a Comment