Data Processing
Solution:
Apache Beam + Cloud Dataflow
Cloud Dataflow
- Auto scaling, No-Ops, Stream and Batch Processing
- Built on Apache Beam
- Pipelines are regional-based
Data Transformation
Cloud Dataproc vs Cloud Dataflow
Key Terms
- Element : single entry of data (eg. table row)
- PCollection: Distributed data set, input and output
- Transform: Data processing in pipeline
- ParDo: Type of Transform