Python-a04-List,Tuple,Set

This article shows the examples of List, Tuple, Set in python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
## List 
# List basics
courses = ['History', 'Math', 'Physics', 'CompSci']
print(courses)
print(len(courses))
print(courses[0])
print(courses[-1]) # start from the end
print(courses[4]) # error
print(courses[0:2]) # the first index include, not the second one
print(courses[:2]) # from the first to index 2
print(courses[2:]) # from index 2 to the end

# Add elements to List
courses1 = ['History', 'Math', 'Physics', 'CompSci']
courses1.append('Art')
print(courses1) # ['History', 'Math', 'Physics', 'CompSci', 'Art']

courses2 = ['History', 'Math', 'Physics', 'CompSci']
courses2.insert(0, 'Art')
print(courses2) # ['Art', 'History', 'Math', 'Physics', 'CompSci']


courses3 = ['History', 'Math', 'Physics', 'CompSci']
courses4 = ['Art', 'Education']
courses3.insert(0, courses4)
print(courses3) # [['Art', 'Education'], 'History', 'Math', 'Physics', 'CompSci']

courses5 = ['History', 'Math', 'Physics', 'CompSci']
courses6 = ['Art', 'Education']
courses5.extend(courses6)
print(courses5) # ['History', 'Math', 'Physics', 'CompSci', 'Art', 'Education']

# Remove elements to List
courses = ['History', 'Math', 'Physics', 'CompSci']
courses.remove('Math')
print(courses) #['History', 'Physics', 'CompSci']

courses1 = ['History', 'Math', 'Physics', 'CompSci']
courses.pop()
print(courses) #['History', 'Math', 'Physics']


courses2 = ['History', 'Math', 'Physics', 'CompSci']
popped = courses.pop()
print(popped) #'CompSci'

# Order in List
courses = ['History', 'Math', 'Physics', 'CompSci']
courses.reverse()
print(courses) #['CompSci', 'Physics', 'Math', 'History']

courses1 = ['History', 'Math', 'Physics', 'CompSci']
courses1.sort()
print(courses1) #['CompSci', 'History', 'Math', 'Physics']

courses2 = ['History', 'Math', 'Physics', 'CompSci']
courses2.sort(reverse=True)
print(courses2) #['Physics', 'Math', 'History', 'CompSci']


courses3 = ['History', 'Math', 'Physics', 'CompSci']
courses4 = sorted(courses3)
print(courses4) #['CompSci', 'History', 'Math', 'Physics']

# Min, Max, Sum in List
nums = [1,5,2,4,3]
print(min(nums)) #1
print(max(nums)) #5
print(sum(nums)) #15

# index in List
courses = ['History', 'Math', 'Physics', 'CompSci']
print(courses.index('ComSci')) # 3
print(courses.index('Art')) # error
print('Art' in courses) # False
print('Math' in courses) # True

for item in courses:
print(item) # History Math Physics CompSci

for index, course in enumerate(courses):
print(index, course) # 0 History 1 Math 2 Physics 3 CompSci

for index, course in enumerate(courses, start=1):
print(index, course) # 1 History 2 Math 3 Physics 4 CompSci

# join in List
courses = ['History', 'Math', 'Physics', 'CompSci']
course_str = ', '.join(courses)
new_courses = course_str.split(', ')
print(course_str) # 'History', 'Math', 'Physics', 'CompSci'
print(new_courses) # ['History', 'Math', 'Physics', 'CompSci']

## Tuple
# List is mutable, Tuple is immnutable, we can't add, append, modify element in Tuple
list1 = ['History', 'Math', 'Physics', 'CompSci']
list2 = list1
print(list1) #['History', 'Math', 'Physics', 'CompSci']
print(list2) #['History', 'Math', 'Physics', 'CompSci']

list1[0] = 'Art'
print(list1) #['Art', 'Math', 'Physics', 'CompSci']
print(list2) #['Art', 'Math', 'Physics', 'CompSci']

tuple1 = ('History', 'Math', 'Physics', 'CompSci')
tuple2 = tuple1
print(tuple1) #('History', 'Math', 'Physics', 'CompSci')
print(tuple2) #('History', 'Math', 'Physics', 'CompSci')

tuple1[0] = 'Art' # error


## Set
# Set values are unordered and unduplicated
set1 = {'History', 'Math', 'Physics', 'CompSci'}
print(set1) #{'Math', 'History', 'Physics', 'CompSci'} order may change

set2 = {'History', 'Math', 'Physics', 'CompSci', 'Math'}
print(set2) #{'Math', 'History', 'Physics', 'CompSci'} order may change

set3 = {'History', 'Math', 'Physics', 'CompSci'}
print('Math' in set3) #True Set is optimized for the check existing

# Set is optimized to find the same and differences between 2 ones
set4 = {'History', 'Math', 'Physics', 'CompSci'}
set5 = {'History', 'Math', 'Art', 'Design'}
print(set4.intersaction(set5)) #{'History', 'Math'}
print(set4.difference(set5)) #{'Physics', 'CompSci'}
print(set4.union(set5)) #{'History', 'Math', 'Physics', 'CompSci', 'Art', 'Design'}

## How to create empty List, Tuple, Set
empty_list = []
empty_list = list()

empty_tuple = ()
empty_tuple = tuple()

empty_set = {} # This is wrong, it's a dict
empty_set = set()

Python-a03-Numeric

This article shows the examples of Float, Integer in python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# different types
num1 = 4
print(type(num1))

num2 = 2.1
print(type(num2))

# arithmetic operators
print(5 + 2)
print(5 - 2)
print(5 * 2)
print(5 / 2)
print(5 // 2) # floor division
print(3 ** 2) # exponent
print(3 % 2) # modulus
print(abs(-3)) # absolute
print(round(3.6)) # round number 4
print(round(3.75, 1)) # round number 3.8

# calculation order
print(3 + 2 * 2)
print((3+2)*2)

# increment
num1 = 1
num1 = num1 + 1
print(num1)

num2 = 1
num2 += 1
print(num2)

# comparisons
print(3 == 2) # equal
print(3 != 2) # not equal
print(3 > 2) # greater than
print(3 < 2) # less than
print(3 >= 2) # greater or equal
print(3 <= 2) # less or equal

# cast
num1 = '10'
num2 = '20'
print(num1 + num2) # 1020

num3 = int(num1)
num4 = int(num2)
print(num3 + num4) # 30

Python-a02-String

This article shows the examples of String in python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# 1
message1 = 'Hello World'
print(message1)

# 2
message2 = "hello 'Jimmy'"
print(message2)

# 3
message3 = 'Hello World'
print(len(message3))

# 4
message4 = 'Hello World'
print(message4[1])
print(message4[-1])
print(message4[0:5]) # first include, second not
print(message4[:5]) # end to the 5
print(message4[6:]) # start from the 6
print(message4[11]) # error

# 5
message5 = 'Hello World'
print(message5.lower())
print(message5.upper())
print(message5.count('l')) # count l's number in the message
print(message5.find('World')) # return 6
print(message5.find('Unknown')) # return -1

# 6
message6 = 'Hello World'
# or by the same message6 = message6.replace('World', 'You')
new_message6 = message6.replace('World', 'You')
print(new_message6)

# 7
greeting='Hello'
name='Bob'
message7 = greeting + ', ' + name
print(message7)

message8 = '{}, {}. Welcome!'.format(greeting, name)
print(message8)

message9 = f'{greeting}, {name}. Welcome!' # python >3.10
print(message9)

message10 = f'{greeting}, {name.upper()}. Welcome!' # python >3.10
print(message10)

# 8
print(help(str))
print(help(str.lower))

Python-a01-How to install embedded python on windows

install an embedded python on Windows

  1. download zip file from https://www.python.org/downloads/windows/, unzip it and add it to the path environment variable

  2. download get-pip.py from https://bootstrap.pypa.io/get-pip.py

  3. run command

    1
    python get-pip.py
  4. add pip path to the path in environment variable

  5. when running pip -V there is error
    ModuleNotFoundError: No module named ‘pip’

    In order to fix this, we have to do as below:
    open python38._pth and add the following paths to it

    1
    2
    3
    C:\Dev\softwares\python-3.8.0-embed-amd64\Scripts
    C:\Dev\softwares\python-3.8.0-embed-amd64\Lib
    C:\Dev\softwares\python-3.8.0-embed-amd64\Lib\site-packages

b06-01-Cloud-PubSub-HelloWorld

In this article I will show you how to publish and receive messages in PubSub with Java

  1. create topic

    1
    gcloud pubsub topics create my-topic
  2. create subscription to this topic

    1
    gcloud pubsub subscriptions create my-sub --topic my-topic
  3. git clone project into cloud shell

    1
    git clone https://github.com/googleapis/java-pubsub.git
  4. go into the sample

    1
    cd samples/snippets/
  5. modify PublisherExample.java and SubscribeAsyncExample.java to put the right project id, topic id and subscription id

  6. compile project

    1
    mvn clean install -DskipTests
  7. run subscriber

    1
    mvn exec:java -Dexec.mainClass="pubsub.SubscribeAsyncExample"
  8. run publisher in another screen and observe subscriber

    1
    mvn exec:java -Dexec.mainClass="pubsub.PublisherExample"

GCP-Kubernetes-Manually

In this article, we will show you how to deploy a web application by kubernetes on gcp.

  1. run nginx on daemon
    1
    docker run -d -p 8080:80 nginx:latest
  2. change index.html in nginx container
    1
    docker cp index.html 607de9f58775:/usr/share/nginx/html/
  3. create docker image from the new container version
    1
    docker commit 607de9f58775 daccfrance:version1
  4. create tag of docker image with project id
    1
    docker tag daccfrance:version1 eu.gcr.io/kube-test-286917/daccfrance:version1
  5. push docker image to gcp container registry
    1
    docker push eu.gcr.io/kube-test-286917/daccfrance:version1
  6. kill docker container
    1
    docker container kill #container_id
  7. set compute zone by default
    1
    gcloud config set compute/zone europe-west1-b
  8. create a kubernetes cluster
    1
    gcloud container clusters create gk-cluster --num-nodes=1
  9. get authentication credentials for the cluster
    1
    gcloud container clusters get-credentials gk-cluster
  10. create kubernetes deployment
    1
    kubectl create deployment web-server --image=eu.gcr.io/kube-test-286917/daccfrance:version1
  11. create kubernetes service
    1
    kubectl expose deployment web-server --type LoadBalancer --port 80 --target-port 80
  12. get kubernetes pods
    1
    kubectl get pods
  13. get kubernetes service
    1
    kubectl get service web-server

spark skewness

Here we have an example of key salting to resolve the problem of skewness in spark.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import org.apache.spark.SparkConf
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object SparkSkewnessExample extends App {

val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkSkewnessExample")

val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()

import spark.implicits._

// DataFrame 1
val df1 = Seq(
("a", "12"),
("a", "31"),
("a", "24"),
("a", "0"),
("a", "24"),
("b", "45"),
("c", "24")
).toDF("id", "value")
df1.show(10,false)

//DataFrame2
val df2 = Seq(
("a", "45"),
("b", "575"),
("c", "54")
).toDF("id", "value")
df2.show(10,false)

// eliminate skewness
def eliminateSkewness(leftDf: DataFrame, leftCol: String, rightDf: DataFrame) = {
val df1 = leftDf
.withColumn(leftCol, concat(
leftDf.col(leftCol), lit("_"), lit(floor(rand(123456) * 10))))

val df2 = rightDf
.withColumn("saltCol",
explode(
array((0 to 10).map(lit(_)): _ *)
))

(df1, df2)
}

val (df3, df4) = eliminateSkewness(df1, "id", df2)

df3.show(100, false)
df4.show(100, false)

//join after eliminating data skewness
df3.join(
df4,
df3.col("id") <=> concat(df4.col("id"), lit("_"), df4.col("saltCol"))
).drop("saltCol")
.show(100,false)
}

b10-Machine Learning

What is Machine Learning ?

Process of combining inputs to produce useful predictions

How it works

  • Train a model with examples(example = input + label)
  • Training = adjust model to learn relationship between features and labels
  • Feature = input variables
  • Inference = apply trained model to unlabeled examples

Learning types

  • Supervised learning
    • Regression - Continuous, numeric variables
    • Classification - categorical variables: yes/no
  • Unsupervised Learning
    • Clustering - finding pattern
    • No labeled or categorized
  • Reinforcement learning
    • Use positive/negative reinforcement to complete a task
      • Complete a maze, learn chess

Neural network

  • Neural network - model composed of layers, consisting of neurons
  • Neuron - node, combines input values and create one output value
  • Feature - input variables used to make predictions
  • Hidden layer - set of neurons operating from same input set
  • Feature engineering - deciding which features to use in a model
  • Epoch - single pass through training dataset
  • Deep and Wide in neural network
    • Wide - memorization: many features
    • Deep - generalization: many hidden layers
    • Deep and Wide - both: good for recommendation engines

What is Overfitting?

training model ‘overfitted’ to training data - unable to generalize with new data

Cause of Overfitting

  • Not enough training data
  • Too many features
  • Model fitted to unnecessary features unique to training data: “noise”

Solving of Overfitting

  • more data
  • make model less complex
  • remove “noise”
    • increase “regularization” parameters

AI platform

  • Fully managed Tensorflow platform
  • Distributed training and predictions
  • Hyperparameter tuning with Hypertune

How AI Platform works

  • Master - manages other nodes
  • Workers - works on portion of training job
  • Parameter servers - coordinate shared model state between workers

b09-BigQuery

What is BigQuery ?

  • Fully Managed Data warehousing
    • Near real time analysis of petabyte scale databases
  • Serverless(no ops)
  • Auto scaling
  • Both storage and analysis
  • Interact with SQL

How BigQuery works

  • Columnar data store
  • It does not update exciting records
  • No transactional

Structure

  • Dataset: contains tables/views
  • Table: collections of columns
  • Job: long running action/query

IAM

  • can control by project, dataset, view
  • cannot control at table level

b08-Cloud Dataproc

Dataproc

  • Hadoop, Spark, Hive, Pig
  • Lift and shift to GCP

Map Reduce

Converting from HDFS to Google Cloud Storage

  • Copy data to GCS
    • Install connector or copy manually
  • Update file prefix in scripts
    • From hdfs:// to gs://
  • Use Dataproc and run against/output to GCS

Dataproc performance optimization

  • Keep your data close to your cluster
    • Place Dataproc cluster in same region as storage bucket
  • Larger persistent disk = better performance
    • Using SSD over HDD
  • Allocate more VMs
    • Use preemptible VM to save on costs