Databricks REST API – Complete Enterprise Automation Guide (Python + AWS)
Databricks REST API – Complete Enterprise Automation Guide
This guide documents almost all commonly used Databricks REST API endpoints
with working Python examples for enterprise automation on AWS.
0️⃣ Authentication & Base Configuration
Account-Level APIs
Base URL: https://accounts.cloud.databricks.com
Auth: Account PAT
Workspace-Level APIs
Base URL: https://dbc-xxxx.region.databricks.com
Auth: Workspace PAT
import requests
ACCOUNT_ID = "xxxx"
ACCOUNT_HOST = "https://accounts.cloud.databricks.com"
WORKSPACE_HOST = "https://dbc-xxxx.us-east-1.databricks.com"
ACCOUNT_HEADERS = {
"Authorization": "Bearer ACCOUNT_TOKEN",
"Content-Type": "application/json"
}
WORKSPACE_HEADERS = {
"Authorization": "Bearer WORKSPACE_TOKEN",
"Content-Type": "application/json"
}
1️⃣ Identity & SCIM APIs
| Endpoint | Purpose |
| POST /scim/v2/Users | Create user |
| GET /scim/v2/Users | List users |
| POST /scim/v2/Groups | Create group |
| PATCH /scim/v2/Groups/{id} | Add/remove members |
Create User
url = f"{ACCOUNT_HOST}/api/2.0/accounts/{ACCOUNT_ID}/scim/v2/Users"
payload = {
"userName": "alice@company.com",
"displayName": "Alice",
"active": true
}
requests.post(url, headers=ACCOUNT_HEADERS, json=payload).raise_for_status()
Create Group
url = f"{ACCOUNT_HOST}/api/2.0/accounts/{ACCOUNT_ID}/scim/v2/Groups"
payload = {"displayName": "data-engineers"}
group = requests.post(url, headers=ACCOUNT_HEADERS, json=payload).json()
2️⃣ Workspace (Account-Level) APIs
| Endpoint | Description |
| POST /workspaces | Create workspace |
| GET /workspaces | List workspaces |
| POST /permissionassignments | Assign groups to workspace |
Create Workspace
url = f"{ACCOUNT_HOST}/api/2.0/accounts/{ACCOUNT_ID}/workspaces"
payload = {
"workspace_name": "prod",
"aws_region": "us-east-1",
"credentials_id": "cred-123",
"storage_configuration_id": "storage-123",
"network_id": "network-123"
}
workspace = requests.post(url, headers=ACCOUNT_HEADERS, json=payload).json()
3️⃣ Cluster APIs
| Endpoint | Description |
| POST /clusters/create | Create cluster |
| GET /clusters/list | List clusters |
| POST /clusters/start | Start cluster |
| POST /clusters/delete | Delete cluster |
Create Cluster
url = f"{WORKSPACE_HOST}/api/2.0/clusters/create"
payload = {
"cluster_name": "engineering",
"spark_version": "13.3.x-scala2.12",
"node_type_id": "m5.xlarge",
"num_workers": 2
}
cluster = requests.post(url, headers=WORKSPACE_HEADERS, json=payload).json()
Set Cluster Permissions
url = f"{WORKSPACE_HOST}/api/2.0/permissions/clusters/{cluster['cluster_id']}"
payload = {
"access_control_list": [
{
"group_name": "data-engineers",
"permission_level": "CAN_ATTACH_TO"
}
]
}
requests.patch(url, headers=WORKSPACE_HEADERS, json=payload)
4️⃣ Jobs API
| Endpoint | Purpose |
| POST /jobs/create | Create job |
| POST /jobs/run-now | Run job |
| GET /jobs/list | List jobs |
Create Job
url = f"{WORKSPACE_HOST}/api/2.0/jobs/create"
payload = {
"name": "etl-job",
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "m5.large",
"num_workers": 2
},
"notebook_task": {
"notebook_path": "/Shared/etl"
}
}
job = requests.post(url, headers=WORKSPACE_HEADERS, json=payload).json()
5️⃣ SQL & Warehouses API
| Endpoint | Description |
| POST /sql/warehouses | Create SQL warehouse |
| POST /sql/statements | Execute SQL |
Execute SQL
url = f"{WORKSPACE_HOST}/api/2.0/sql/statements"
payload = {
"statement": "SELECT current_user(), current_date()",
"warehouse_id": "wh-123"
}
requests.post(url, headers=WORKSPACE_HEADERS, json=payload)
6️⃣ DBFS & Workspace APIs
| Endpoint | Description |
| POST /dbfs/put | Upload file |
| GET /workspace/list | List notebooks |
| POST /workspace/import | Import notebook |
Upload File to DBFS
url = f"{WORKSPACE_HOST}/api/2.0/dbfs/put"
payload = {
"path": "/tmp/data.txt",
"contents": "SGVsbG8="
}
requests.post(url, headers=WORKSPACE_HEADERS, json=payload)
7️⃣ Unity Catalog APIs (Most Used)
| Endpoint | Description |
| POST /unity-catalog/catalogs | Create catalog |
| POST /unity-catalog/schemas | Create schema |
| POST /unity-catalog/tables | Create table |
| PATCH /unity-catalog/permissions | Grant access |
Create Catalog
url = f"{WORKSPACE_HOST}/api/2.1/unity-catalog/catalogs"
payload = {"name": "finance"}
requests.post(url, headers=WORKSPACE_HEADERS, json=payload)
Grant Table Access
url = f"{WORKSPACE_HOST}/api/2.1/unity-catalog/permissions/table/finance.payments.txns"
payload = {
"changes": [{
"principal": "data-scientists",
"add": ["SELECT"]
}]
}
requests.patch(url, headers=WORKSPACE_HEADERS, json=payload)
8️⃣ Tokens, Secrets, Repos
| Endpoint | Use |
| POST /token/create | Create PAT |
| POST /secrets/scopes/create | Create secret scope |
| POST /repos | Create repo |
9️⃣ Enterprise Best Practices
- Terraform for bootstrap & security
- Python APIs for day-2 operations
- Unity Catalog for ALL data access
- No IAM-based data access
This API-first approach is used by regulated banks, fintech, and large enterprises.
Next Topics You Can Publish
- Databricks CI/CD pipelines
- API error handling & retries
- Zero-trust data architecture
- Cross-account Unity Catalog sharing
No comments:
Post a Comment