gcp►b-data-engineer

b10-Machine Learning

What is Machine Learning ?

Process of combining inputs to produce useful predictions

How it works

Train a model with examples(example = input + label)
Training = adjust model to learn relationship between features and labels
Feature = input variables
Inference = apply trained model to unlabeled examples

Learning types

Supervised learning
- Regression - Continuous, numeric variables
- Classification - categorical variables: yes/no
Unsupervised Learning
- Clustering - finding pattern
- No labeled or categorized
Reinforcement learning
- Use positive/negative reinforcement to complete a task
  - Complete a maze, learn chess

Neural network

Neural network - model composed of layers, consisting of neurons
Neuron - node, combines input values and create one output value
Feature - input variables used to make predictions
Hidden layer - set of neurons operating from same input set
Feature engineering - deciding which features to use in a model
Epoch - single pass through training dataset
Deep and Wide in neural network
- Wide - memorization: many features
- Deep - generalization: many hidden layers
- Deep and Wide - both: good for recommendation engines

What is Overfitting?

training model ‘overfitted’ to training data - unable to generalize with new data

Cause of Overfitting

Not enough training data
Too many features
Model fitted to unnecessary features unique to training data: “noise”

Solving of Overfitting

more data
make model less complex
remove “noise”
- increase “regularization” parameters

AI platform

Fully managed Tensorflow platform
Distributed training and predictions
Hyperparameter tuning with Hypertune

How AI Platform works

Master - manages other nodes
Workers - works on portion of training job
Parameter servers - coordinate shared model state between workers