Tuesday, 6 January 2026

Medallion Architecture in Databricks (Bronze, Silver, Gold) – Complete Example

Complete Medallion Architecture Guide with Databricks

Complete Medallion Architecture Guide (Bronze, Silver, Gold)

This guide explains the full Medallion Architecture pattern used in modern data platforms such as Databricks Lakehouse.


1. Bronze Layer (Raw Data)

Bronze stores exact raw data from source systems without transformation.

Typical Sources:

  • API
  • Kafka / Streaming systems
  • CDC databases
  • JSON files
  • Application logs

Example Bronze Table

bronze.orders_raw

Example Bronze Records

order_id payload ingestion_time
1001 {"customer":"C101","amount":250,"status":"NEW"} 2026-01-01
1002 {"customer":"C102","amount":300,"status":"NEW"} 2026-01-01
1003 {"customer":"C101","amount":150,"status":"CANCELLED"} 2026-01-02

Create Bronze Table

CREATE TABLE bronze.orders_raw
(
  order_id STRING,
  payload STRING,
  ingestion_time TIMESTAMP
)
USING DELTA;

2. Silver Layer (Cleaned Data)

Silver layer contains structured and validated data. JSON is parsed and converted into columns.

Example Silver Table

silver.orders_clean

Silver Data Example

order_id customer_id amount status order_date
1001 C101 250 NEW 2026-01-01
1002 C102 300 NEW 2026-01-01
1003 C101 150 CANCELLED 2026-01-02

Bronze → Silver Transformation

CREATE OR REPLACE TABLE silver.orders_clean AS

SELECT
  order_id,
  get_json_object(payload,'$.customer') AS customer_id,
  CAST(get_json_object(payload,'$.amount') AS DOUBLE) AS amount,
  get_json_object(payload,'$.status') AS status,
  DATE(ingestion_time) AS order_date

FROM bronze.orders_raw;

3. Gold Layer (Business Aggregates)

Gold layer contains analytics-ready data for reporting and dashboards.

Example Gold Table

gold.daily_sales

Gold Data Example

order_date customer_id total_orders total_sales
2026-01-01 C101 1 250
2026-01-01 C102 1 300
2026-01-02 C101 1 150

Silver → Gold Aggregation

CREATE OR REPLACE TABLE gold.daily_sales AS

SELECT
  order_date,
  customer_id,
  COUNT(order_id) AS total_orders,
  SUM(amount) AS total_sales

FROM silver.orders_clean
GROUP BY order_date, customer_id;

4. Final Data Flow


Raw Source Data
      │
      ▼

Bronze Table
orders_raw
(JSON payload)

      │
      ▼

Silver Table
orders_clean
(structured columns)

      │
      ▼

Gold Table
daily_sales
(aggregated analytics)


5. How This Appears in Unity Catalog


Catalog: sales_data

Schemas
│
├── bronze
│     └ orders_raw
│
├── silver
│     └ orders_clean
│
└── gold
      └ daily_sales

Full table names:

sales_data.bronze.orders_raw
sales_data.silver.orders_clean
sales_data.gold.daily_sales

6. How Unity Catalog Controls Access

Unity Catalog provides centralized Role-Based Access Control (RBAC). Permissions can be granted at:

  • Catalog level
  • Schema level
  • Table level
  • Column level

This enables fine-grained governance across Bronze, Silver, and Gold layers.


Example Enterprise Role Strategy

Role Bronze Access Silver Access Gold Access
Data Engineer Full Access Full Access Full Access
Data Analyst No Access No Access Read Only
Data Scientist Read Access Read Access Read Access
BI Team No Access No Access Read Access

7. Example Unity Catalog Grants

Data Engineer Access

GRANT ALL PRIVILEGES
ON CATALOG sales_data
TO `data_engineers`;

Data Analyst Access (Gold Only)

GRANT USAGE ON CATALOG sales_data TO `data_analysts`;

GRANT USAGE ON SCHEMA sales_data.gold TO `data_analysts`;

GRANT SELECT ON ALL TABLES IN SCHEMA sales_data.gold
TO `data_analysts`;
This ensures analysts can only access Gold layer tables.

6. Real Enterprise Pattern

Large organizations structure data by domain.


catalog: ecommerce

schemas
│
├ bronze
│   ├ orders_raw
│   ├ payments_raw
│   └ customers_raw
│
├ silver
│   ├ orders_clean
│   ├ payments_clean
│   └ customers_clean
│
└ gold
    ├ daily_sales
    ├ customer_lifetime_value
    └ product_revenue


7. Storage Layout in AWS

When running on Amazon Web Services, tables are stored in Amazon S3 using Delta Lake format.


s3://company-datalake/

bronze/
   orders_raw/

silver/
   orders_clean/

gold/
   daily_sales/

Each table uses Delta Lake which provides:

  • ACID transactions
  • Time travel
  • Schema enforcement
  • High performance analytics
  • Efficient storage optimization

8. How Big Companies Use This

Companies use Medallion architecture for:

  • Fraud detection
  • Trading analytics
  • Customer 360
  • AI feature stores

Typical pipeline size:

  • 50TB+ data daily
  • 1000+ tables
  • 100+ pipelines

Conclusion

Medallion Architecture provides a scalable, governed, and high-performance data design pattern. It is widely used in modern lakehouse platforms such as Databricks.

No comments:

Post a Comment