Run Google WordCount Template using projects.locations.jobs.create
Normally Google-provided Dataflow templates (e.g. gs://dataflow-templates/latest/Word_Count)
are executed with projects.locations.templates.launch. But if you must use the low-level
projects.locations.jobs.create API, you can embed template metadata into the job's
environment.sdkPipelineOptions so Dataflow will read and run the staged template.
When to use this
- You have automation that must create a Job object via the Dataflow Jobs API.
- You understand that
jobs.createis low-level and usually expects a compiled pipeline (steps). - You cannot use
templates.launchfor policy/compatibility reasons but still want to run a template.
Important: This approach is unconventional —
templates.launch is the recommended, supported way
to run GCP-provided templates. Use jobs.create only when you need the low-level route.
Working Python example
Save this as a Python script and run where your application credentials / service account are available.
It uses googleapiclient.discovery to call the Dataflow REST API.
from googleapiclient.discovery import build
from google.auth import default
import random
import json
# ------------------ CONFIG ------------------
PROJECT_ID = "your-project-id"
REGION = "us-east4" # choose your region
BUCKET = "your-bucket" # bucket for temp & I/O (must exist)
SERVICE_ACCOUNT = "your-sa@your-project.iam.gserviceaccount.com"
# The Google-provided WordCount template path
DATAFLOW_TEMPLATE_GCS = "gs://dataflow-templates/latest/Word_Count"
# Input/output example files in GCS
INPUT_FILE = f"gs://{BUCKET}/input/test.txt"
OUTPUT_FILE = f"gs://{BUCKET}/output/my_output.txt"
# --------------------------------------------
# Authenticate (Application Default Credentials or other)
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
dataflow = build("dataflow", "v1b3", credentials=credentials)
random_str = str(random.randint(1000, 9999))
job_name_
No comments:
Post a Comment