Databricks APIs – Architecture, Types, and Python Examples
Databricks provides a comprehensive set of REST APIs to automate platform setup, workspace administration, data governance, compute management, and analytics workflows. These APIs are commonly used for infrastructure automation, CI/CD pipelines, and application onboarding.
Common Python Setup
import requests
import json
DATABRICKS_HOST = "https://<databricks-instance>"
TOKEN = "<DATABRICKS_TOKEN>"
HEADERS = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
}
1. Account API
Purpose: Manage Databricks accounts and workspaces.
Documentation: Databricks Account API
Create a Workspace
url = f"{DATABRICKS_HOST}/api/2.0/accounts/<ACCOUNT_ID>/workspaces"
payload = {
"workspace_name": "dev-workspace",
"aws_region": "us-east-1",
"credentials_id": "cred-id",
"storage_configuration_id": "storage-id",
"network_id": "network-id"
}
response = requests.post(url, headers=HEADERS, json=payload)
print(response.json())
2. SCIM API
Purpose: Manage users, groups, and service principals.
Documentation: Databricks SCIM API
Create a Service Principal
url = f"{DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals"
payload = {
"displayName": "my-app-sp"
}
response = requests.post(url, headers=HEADERS, json=payload)
print(response.json())
3. Unity Catalog API
Purpose: Centralized data governance for catalogs, schemas, and tables.
Documentation: Unity Catalog API
Create a Catalog
url = f"{DATABRICKS_HOST}/api/2.1/unity-catalog/catalogs"
payload = {
"name": "sales_catalog",
"comment": "Catalog for sales domain"
}
response = requests.post(url, headers=HEADERS, json=payload)
print(response.json())
Grant Catalog Permission
url = f"{DATABRICKS_HOST}/api/2.1/unity-catalog/permissions/catalogs/sales_catalog"
payload = {
"changes": [
{
"principal": "data_analysts",
"add": ["USE_CATALOG"]
}
]
}
response = requests.patch(url, headers=HEADERS, json=payload)
print(response.json())
4. Workspace API
Purpose: Manage clusters, jobs, notebooks, and workspace objects.
Documentation: Workspace API
Create a Cluster
url = f"{DATABRICKS_HOST}/api/2.0/clusters/create"
payload = {
"cluster_name": "demo-cluster",
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 1,
"autotermination_minutes": 30
}
response = requests.post(url, headers=HEADERS, json=payload)
print(response.json())
5. Jobs API
Purpose: Orchestrate batch and streaming workloads.
Documentation: Jobs API
Create a Job
url = f"{DATABRICKS_HOST}/api/2.1/jobs/create"
payload = {
"name": "sample-job",
"tasks": [
{
"task_key": "run_notebook",
"notebook_task": {
"notebook_path": "/Shared/sample_notebook"
},
"new_cluster": {
"spark_version": "13.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 1
}
}
]
}
response = requests.post(url, headers=HEADERS, json=payload)
print(response.json())
6. Repos API
Purpose: Integrate Git repositories.
Documentation: Repos API
No comments:
Post a Comment