【動手玩系列#2】TensorFlow 帶你無師自通成為植物學家

#GCP手把手教學

文/Allen｜編輯/Quen

上一次，我們已經介紹了 TensorFlow 的初步概念，也大致提到了機器學習是怎麼樣透過物品的屬性來分類啤酒跟紅酒的差異。不過，啤酒、紅酒畢竟是人類一眼就可以分辨的東西，好像還不能夠完全顯示出 TensorFlow（ Machine Learning Engine）的強大，來看看今天我們要用什麼 data 玩轉 TensorFlow 吧！

TensorFlow 的思考邏輯

有些 Machine Learning 的大牛讀者可能已經看到不厭其煩了，不過這邊還是在進入實作前先大略得為大家分出幾個階段。

1. 載入資料（Data）：在這次的實作中是讀入已經建置好的 csv 檔案。
分別是要丟進去訓練用的 Iris_trainig.csv 以及檢測成果的 Iris_test.csv：
Iris_training.csv (120 samples)： http://download.tensorflow.org/data/iris_training.csv
Iris_test.csv (30 samples)： http://download.tensorflow.org/data/iris_test.csv

2. 建立類神經網路分類器（Neural Network Classifier）：
設定 TensorFlow 內建已經訓練好的模型（pre-trained mode）來符合我們的植物學家任務。

3. 用第 1 步驟讀入的 iris_training.csv 來訓練模型。

4. 驗證模型是不是好的。

5. 拿野生的鳶尾花來試試看吧！

安裝 TensorFlow

工欲善其事，必先利其器，這是 2500 年前的老人家都知道的事情；不過安裝 TensorFlow 很簡單，如果寫在這裡已經會變成另一篇文章，所以請參考 Google 提供的官方教學：

如果是英文苦手的話，也可以 Google 一下各種教學文唷！

動手玩玩深度學習神器 TensorFlow

好了，鋪了這麼久的梗，我們快來 step-by-step 玩一下 TensorFlow 吧！

– 示範程式碼（DEMO CODE）

首先，來個 Python code，不要被嚇到啊！！等一下會解釋，先把它複製起來存成 .py 檔吧。（或者你也可以按這裡懶人下載）

完整程式碼

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import urllib
import numpy as np
import tensorflow as tf
# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"
IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
def main():
# If the training and test sets aren't stored locally, download them.
if not os.path.exists(IRIS_TRAINING):
raw = urllib.urlopen(IRIS_TRAINING_URL).read()
with open(IRIS_TRAINING, "w") as f:
f.write(raw)
if not os.path.exists(IRIS_TEST):
raw = urllib.urlopen(IRIS_TEST_URL).read()
with open(IRIS_TEST, "w") as f:
f.write(raw)
# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float32)
# Specify that all features have real-value data
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]
# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")
# Define the training inputs
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(training_set.data)},
y=np.array(training_set.target),
num_epochs=None,
shuffle=True)
# Train model.
classifier.train(input_fn=train_input_fn, steps=2000)
# Define the test inputs
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(test_set.data)},
y=np.array(test_set.target),
num_epochs=1,
shuffle=False)
# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]
print("nTest Accuracy: {0:f}n".format(accuracy_score))
# Classify two new flower samples.
new_samples = np.array(
[[6.4, 3.2, 4.5, 1.5],
[5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": new_samples},
num_epochs=1,
shuffle=False)
predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]
print(
"New Samples, Class Predictions: {}n"
.format(predicted_classes))
if __name__ == "__main__":
main()

– 分段解釋

好的，如果你是 Python 跟 TensorFlow 大牛，肯定不費力氣的就知道上面在寫什麼；那我們就為像小編一樣的入門苦手分段解釋一下這段 code 吧！

定義使用的套件

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import urllib

import numpy as np
import tensorflow as tf

這邊就是在 Python 執行 TensorFlow 需要的一些基本款，你需要先讓 Python 知道要匯入 TensorFlow 套件。

讀入資料

這邊可以分為兩個步驟：

# 設定資料集
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

def main():
# 如果訓練與測試用資料不存在本機端，到指定的網址去下載
if not os.path.exists(IRIS_TRAINING):
raw = urllib.urlopen(IRIS_TRAINING_URL).read()
with open(IRIS_TRAINING, "w") as f:
f.write(raw)

if not os.path.exists(IRIS_TEST):
raw = urllib.urlopen(IRIS_TEST_URL).read()
with open(IRIS_TEST, "w") as f:
f.write(raw)

先指定鳶尾花的測量資料為「IRIS_TRAINING」、測試資料為「IRIS_TEST」，並且如果本機端沒有找到相關的 .csv 資料檔的話，就到預設的網址去下載。

# 載入資料並指定型別
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float32)

# 指定所有特徵資料皆為實數
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]

第二步驟在這邊我們用 TensorFlow 內建的處理方式「tf.contrib.learn.datasets.base.load_csv_with_header」來讀入 .csv 檔，並指定了 3 個參數：

「指定檔名為剛剛的資料檔」filename=IRIS_TRAINING or IRIS_TEST
「指定運算結果為整數型態」target_dtype=np.int
「指定特徵資料來源內容為浮點數」features_dtype=np.float32

建構類神經網路分類器

# 建立 3 層的類神經網路分類器，3 層的單元數分別為 10、20、10 個
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20, 10],
n_classes=3,
model_dir="/tmp/iris_model")

開始訓練模型

# 定義訓練資料的輸入
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(training_set.data)},
y=np.array(training_set.target),
num_epochs=None,
shuffle=True)

# 訓練模型
classifier.train(input_fn=train_input_fn, steps=2000)

在這邊，我們利用 TensorFlow 的 estimator 內建資料函式來讀入資料，並將 x 指定為資料，y 指定為目標結果。

接下來就使用 classifier.train() 函式來訓練我們的模型了！

可以提一下的是，這邊的 steps = 2000 所代表的意思為訓練步驟的次數（number of steps to train），我也能用兩行分別次數設定為 1000 的指令來做一樣的操作。

# 訓練模型
classifier.train(input_fn=train_input_fn, steps=1000)
classifier.train(input_fn=train_input_fn, steps=1000)

驗證模型的訓練成效

# 定義測試資料的輸入
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": np.array(test_set.data)},
y=np.array(test_set.target),
num_epochs=1,
shuffle=False)

# 驗證模型準確度
accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]

print("nTest Accuracy: {0:f}n".format(accuracy_score))

我們依樣畫葫蘆的如同訓練資料一樣定義測試資料，並使用 classifier.evaluate() 函式來驗證，此函數會給出一個 0 到 1 的數值，即百分比的數值。

# 準確性
Test Accuracy: 0.966667

輸入一組新資料讓模型判別

# 輸入一組新資料讓模型判別
# 嘗試分辨 2 組新的數據
new_samples = np.array(
[[6.4, 3.2, 4.5, 1.5],
[5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": new_samples},
num_epochs=1,
shuffle=False)

predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]

print(
"New Samples, Class Predictions: {}n"
.format(predicted_classes))

在此我們給入了 2 組資料作為 classifier.predict() 的輸入，螢幕上將印出此模型認為的鳶尾花品種。

# 第一組為 Iris versicolor，第二組為 Iris virginica
New Samples, Class Predictions: [1 2]