How to create Dataflow Flex templates and trigger them from Cloud Functions-2

Rajathithan Rajasekar
5 min readAug 13, 2023
Photo by Davies Designs Studio on Unsplash

In my last medium post, I had discussed on how to create a custom classic dataflow template to load xml data from GCS to BigQuery. This is in continuation to that, so I would recommend going through the below post to get more details.

In this post, we are going to see how to create dataflow flex template , so it can help us to parameterize the inputs/outputs in our dataflow Job. We will take the same scenario as earlier (uploading xml data to BigQuery via Dataflow)

Sample files required for creating a flex template is given in the below link,

In creating the pipeline for dataflow flex template, we need to use the beam_args in pipeline options. ( When I was testing this , I failed to set the beam_args properly , which resulted in creating 2 dataflow jobs from the same flex template launch. I was able to resolve it , after setting up the pipeline with beam_args.)

pipeline-run-flex.py

The next step is to create the Dockerfile,

I chose python39 as the base template and copied the required files to the template directory in the container image. All required python dependencies are packed inside this container.

Dockerfile

FROM gcr.io/dataflow-templates-base/python39-template-launcher-base:latest

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="/template/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/template/pipeline-run-flex.py"

COPY …

--

--

Rajathithan Rajasekar

I like to write code in Python . Interested in cloud , dataAnalysis, computerVision, ML and deepLearning. https://rajathithanrajasekar.medium.com/membership