How to create Dataflow Flex templates and trigger them from Cloud Functions-2
In my last medium post, I had discussed on how to create a custom classic dataflow template to load xml data from GCS to BigQuery. This is in continuation to that, so I would recommend going through the below post to get more details.
How to load nested XML data from GCS into BigQuery via Dataflow
If you are one among the few who still uses XML data to transfer / communicate between systems and wants that data to…
In this post, we are going to see how to create dataflow flex template , so it can help us to parameterize the inputs/outputs in our dataflow Job. We will take the same scenario as earlier (uploading xml data to BigQuery via Dataflow)
Sample files required for creating a flex template is given in the below link,
GitHub - rajathithan/gcstobq: Dataflow Job to uploading xml data from GCS to BQ
Dataflow Job to uploading xml data from GCS to BQ. Contribute to rajathithan/gcstobq development by creating an account…
In creating the pipeline for dataflow flex template, we need to use the beam_args in pipeline options. ( When I was testing this , I failed to set the beam_args properly , which resulted in creating 2 dataflow jobs from the same flex template launch. I was able to resolve it , after setting up the pipeline with beam_args.)
The next step is to create the Dockerfile,
I chose python39 as the base template and copied the required files to the template directory in the container image. All required python dependencies are packed inside this container.