Friday, 8 August 2025

create new job help

GCP Dataflow — Run WordCount using projects.locations.jobs.create

Run Google WordCount Template using projects.locations.jobs.create

Normally Google-provided Dataflow templates (e.g. gs://dataflow-templates/latest/Word_Count) are executed with projects.locations.templates.launch. But if you must use the low-level projects.locations.jobs.create API, you can embed template metadata into the job's environment.sdkPipelineOptions so Dataflow will read and run the staged template.

When to use this

  • You have automation that must create a Job object via the Dataflow Jobs API.
  • You understand that jobs.create is low-level and usually expects a compiled pipeline (steps).
  • You cannot use templates.launch for policy/compatibility reasons but still want to run a template.
Important: This approach is unconventional — templates.launch is the recommended, supported way to run GCP-provided templates. Use jobs.create only when you need the low-level route.

Working Python example

Save this as a Python script and run where your application credentials / service account are available. It uses googleapiclient.discovery to call the Dataflow REST API.

from googleapiclient.discovery import build
from google.auth import default
import random
import json

# ------------------ CONFIG ------------------
PROJECT_ID = "your-project-id"
REGION = "us-east4"                       # choose your region
BUCKET = "your-bucket"                    # bucket for temp & I/O (must exist)
SERVICE_ACCOUNT = "your-sa@your-project.iam.gserviceaccount.com"

# The Google-provided WordCount template path
DATAFLOW_TEMPLATE_GCS = "gs://dataflow-templates/latest/Word_Count"

# Input/output example files in GCS
INPUT_FILE = f"gs://{BUCKET}/input/test.txt"
OUTPUT_FILE = f"gs://{BUCKET}/output/my_output.txt"

# --------------------------------------------
# Authenticate (Application Default Credentials or other)
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
dataflow = build("dataflow", "v1b3", credentials=credentials)

random_str = str(random.randint(1000, 9999))
job_name_

No comments:

Post a Comment