Saturday, October 05, 2019

Enterprise Integration Recipe: File Transfer Using Google Cloud Storage


This is one of a series of recipes on how to use products offered by Google Cloud Platform to implement enterprise integration solutions.

GCP Product used in this article:

Implementation


System Diagram
System Diagram

Flow Chart for Subscriber
Flow Chart for Subscriber


Recipe:
  • gcs.tf: TerraForm code that provision and configure all GCP resource needed
  • README.md: up to date instruction on how to use the recipe
Noteworthy Tips:
  • GCS Object Lifecycle: make it easy to configure archive and error box. We should move old files to cold or nearline storage and eventually purge them. No more custom CRON jobs for this chore;
  • Pubsub Notification for GCS: we configured notification for inbox "OBJECT_FINALIZE" event. Again no more CRON jobs needed to scan folders for new files. Another advantage is that the subscriber do not need to worry if we might be dealing with partially uploaded file;
  • Stackdriver Monitoring or GCS: not configured in the recipe. But definitely worth exploring. For example, "object_count" might be used to monitor error box, and send alert when we see the count increasing too fast;
  • Retry: this recipe implemented retry by simply do not "ACK" CPS messages. Cloud Pubsub will send notification again after ACK deadline expired for 7 days. So retry is implemented by CPS subscription's At-Least-Once Delivery feature. If want a shorter total retry period, you can add a line of code to check age of a message and mark it non-retriable if message age is over threshold. This approach saves the trouble of implementing reliably retry mechanism. Downside is lost control of retry intervals (for example, there is no exponential backoff retry)