b07-Cloud Dataflow

Data Processing

Solution:
Apache Beam + Cloud Dataflow

Cloud Dataflow

  • Auto scaling, No-Ops, Stream and Batch Processing
  • Built on Apache Beam
  • Pipelines are regional-based

Data Transformation

Cloud Dataproc vs Cloud Dataflow

Key Terms

  • Element : single entry of data (eg. table row)
  • PCollection: Distributed data set, input and output
  • Transform: Data processing in pipeline
  • ParDo: Type of Transform