How to connect to Private Cloud-SQL from DataProc

Rajathithan Rajasekar
2 min readOct 24, 2023
Photo by Daniel Born on Unsplash

If you need to connect to private cloud sql from dataproc for your Hadoop / Spark batch jobs , then this post is for you.

When we provision a dataproc cluster , there is an option to give the initialization action .

Initialization actions are executed on each node in series during cluster creation. They are also executed on each added node when we scale the cluster nodes.

To do this follow the below steps

  • Create a custom service account
  • Add roles/cloudsql.admin IAM role to the custom service account
  • Copy the script to your project’s gcs bucket
  • Include that location of the script in the initialization actions.
  • Add to cluster VM access scopes
  • Additional scopes below can be added as required.
  • Provision the dataproc cluster with the custom service account.
gcloud dataproc clusters create clustername \
--image-version XXXXXXX \
--bucket XXXXXXX \
--region XXXXX \
--no-address \
--zone XXXXX \
--master-machine-type XXXXX \
--master-boot-disk-size 500 \
--master-boot-disk-type pd-standard \
--num-masters 1 \
--num-workers 3 \
--worker-machine-type XXXX \
--worker-boot-disk-size 500 \
--worker-boot-disk-type pd-standard \
--shielded-integrity-monitoring \
--shielded-secure-boot \
--shielded-vtpm \
--initialization-actions gs:\\xxxx\ \
--optional-components XXXXX \
--scopes '',XXXXXX,XXXXXX \…



