How to transfer a batch of data between GCS buckets using Cloud Functions
In this post, we will see how to transfer data from one cloud storage bucket to another cloud storage bucket using EventArc and CloudFunction.
Let’s Consider a scenario where data is uploaded to source bucket at different times in a day and all these data needs to be staged together in another GCS bucket for further processing down the data pipeline . Accumulating data all together in a batch helps in efficient & cost-saving batch operations.
Enable all the required APIs when prompted at each section.
Create your source bucket in the region of your choice, set all default options and disable public access.
The data that is uploaded to the source bucket should follow a certain lexicography . eg: 03–01–2023-File-001.txt, 03–01–2023-File-002.txt.. and so on .
Create an eventarc trigger for the storage bucket event type — “google.cloud.storage.object.v1.finalized” and destination platform as cloud functions to create the cloud function along with this event trigger
Create the Gen-2 cloud function with authenticated invocation , keep the default settings and click on “next” to go the code section.
If you are using a trial account keep the maximum number of instances to 4 (so you dont go above the quota assigned for your region)
Note: (Gen2 is selected because the execution time of the cloud function may differ based on the load size and the amount of data transferred , so it is recommended to use Gen2)