作爲TensorFlow的底層語言，你會用C++構建深度神經網絡嗎？

2017-12-29 15:49:00.0

選自Matrices.io

作者：Florian Courtial

目前流行的深度學習框架 TensorFlow（TensorFlow 中文官方公衆號已於月初發布）是以 C++爲底層構建的，但絕大多數人都在 Python 上使用 TensorFlow 來開發自己的模型。隨着 C++ API 的完善，直接使用 C++來搭建神經網絡已經成爲可能，本文將向你介紹一種簡單的實現方法。

很多人都知道 TensorFlow 的核心是構建在 C++之上的，但是這種深度學習框架的大多數功能只在 Python API 上才方便使用。

當我寫上一篇文章的時候，我的目標是僅使用 TensorFlow 中的 C++ API 和 CuDNN 來實現基本的深度神經網絡（DNN）。在實踐中，我意識到在這個過程中我們忽略了很多東西。

注意，使用外部操作（exotic operations）訓練神經網絡是不可能的，你面臨的錯誤最有可能就是缺少梯度運算。目前我正在試圖將 Python 上的梯度運算遷移到 C++上。

在本文中，我將展示如何使用 TensorFlow 在 C++ 上構建深度神經網絡，並通過車齡、公里數和使用油品等條件爲寶馬 1 系汽車進行估價。目前，我們還沒有可用的 C++ 優化器，所以你會看到訓練代碼看起來不那麼吸引人，但是我們會在未來加入的。

本文章遵從 TensorFlow 1.4 C++ API 官方指南：https://www.tensorflow.org/api_guides/cc/guide
代碼 GitHub：https://github.com/theflofly/dnn_tensorflow_cpp

安裝

我們會在 C++ 中運行 TensorFlow 框架，我們需要嘗試使用已編譯的庫，但肯定有些人會因爲環境的特殊性而遇到麻煩。從頭開始構建 TensorFlow 將避免這些問題，同時確保使用的是最新版本的 API。

首先，你需要安裝 bazel 構建工具，這裏有安裝方法：https://docs.bazel.build/versions/master/install.html

在 OSX 上 brew 就足夠了：

brew install bazel

你需要從 TensorFlow 源文件開始構建：

mkdir /path/tensorflow
cd /path/tensorflow
git clone https://github.com/tensorflow/tensorflow.git

隨後你需要進行配置，如選擇是否使用 GPU，你需要這樣運行配置腳本：

cd /path/tensorflow
./configure

現在我們要創建接收 TensorFlow 模型代碼的文件。請注意，第一次構建需要花費很長一段時間（10-15 分鐘）。非核心的 C++ TF 代碼在 /tensorflow/cc 中，這是我們創建模型文件的位置，我們也需要 BUILD 文件讓 bazel 可以構建模型。

mkdir /path/tensorflow/model
cd /path/tensorflow/model
touch model.cc
touch BUILD

我們在 BUILD 文件中加入 bazel 指令：

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")
tf_cc_binary(
name = "model",
srcs = [
"model.cc",
],
deps = [
"//tensorflow/cc:gradients",
"//tensorflow/cc:grad_ops",
"//tensorflow/cc:cc_ops",
"//tensorflow/cc:client_session",
"//tensorflow/core:tensorflow"
],
)

基本上，它會使用 model.cc 構建一個二進制文件。現在，我們可以開始編寫自己的模型了。

讀取數據

這些數據從法國網站 leboncoin.fr 上摘取，隨後被清理和歸一化，並被存儲於 CSV 文件中。我們的目標是讀取這些數據。經歸一化的源數據被存儲在 CSV 文件的第一行，我們需要使用它們重構神經網絡輸出的價格。所以，我們創建 data_set.h 和 data_set.cc 文件來保持代碼清潔。它們從 CSV 文件中生成一個浮點型的二維數組，並用於饋送到神經網絡。

data_set.h

using namespace std;
// Metadata used to normalize the data set. Usefulto
// go back andforth between normalized data.
classDataSetMetaData{
friend classDataSet;
private:
float mean_km;
float std_km;
float mean_age;
float std_age;
float min_price;
float max_price;
};
enum classFuel{
DIESEL,
GAZOLINE
};
classDataSet{
public:
// Constructa data set fromthe given csv file path.
DataSet(string path) {
ReadCSVFile(path);
}
// getters
vector & x() { returnx_; }
vector & y() { returny_; }
// read the given csv file andcomplete x_ andy_
void ReadCSVFile(string path);
// convert one csv line to a vector of float
vector ReadCSVLine(string line);
// normalize a human input using the data set metadata
initializer_list input(float km, Fuelfuel, float age);
// convert a price outputted by the DNN to a human price
float output(float price);
private:
DataSetMetaDatadata_set_metadata;
vector x_;
vector y_;
};

data_set.cc

#include
#include
#include
#include
#include "data_set.h"
using namespace std;
void DataSet::ReadCSVFile(string path) {
ifstream file(path);
stringstream buffer;
buffer << file.rdbuf();
string line;
vector lines;
while(getline(buffer, line, 'n')) {
lines.push_back(line);
}
// the first line contains the metadata
vector metadata = ReadCSVLine(lines[0]);
data_set_metadata.mean_km = metadata[0];
data_set_metadata.std_km = metadata[1];
data_set_metadata.mean_age = metadata[2];
data_set_metadata.std_age = metadata[3];
data_set_metadata.min_price = metadata[4];
data_set_metadata.max_price = metadata[5];
// the other lines contain the features foreach car
for(int i = 2; i < lines.size(); ++i) {
vector features = ReadCSVLine(lines[i]);
x_.insert(x_.end(), features.begin(), features.begin() + 3);
y_.push_back(features[3]);
}
}
vector DataSet::ReadCSVLine(string line) {
vector line_data;
std::stringstream lineStream(line);
std::string cell;
while(std::getline(lineStream, cell, ','))
{
line_data.push_back(stod(cell));
}
returnline_data;
}
initializer_list DataSet::input(float km, Fuelfuel, float age) {
km = (km - data_set_metadata.mean_km) / data_set_metadata.std_km;
age = (age - data_set_metadata.mean_age) / data_set_metadata.std_age;
float f = fuel == Fuel::DIESEL ? -1.f: 1.f;
return{km, f, age};
}
float DataSet::output(float price) {
returnprice * (data_set_metadata.max_price - data_set_metadata.min_price) + data_set_metadata.min_price;
}

我們必須在 bazel BUILD 文件中添加這兩個文件。

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")
tf_cc_binary(
name = "model",
srcs = [
"model.cc",
"data_set.h",
"data_set.cc"
],
deps = [
"//tensorflow/cc:gradients",
"//tensorflow/cc:grad_ops",
"//tensorflow/cc:cc_ops",
"//tensorflow/cc:client_session",
"//tensorflow/core:tensorflow"
],
)

構建模型

第一步是讀取 CSV 文件，並提取出兩個張量，其中 x 是輸入，y 爲預期的真實結果。我們使用之前定義的 DataSet 類。

CSV 數據集下載鏈接：https://github.com/theflofly/dnn_tensorflow_cpp/blob/master/normalized_car_features.csv

DataSetdata_set("/path/normalized_car_features.csv");
Tensorx_data(DataTypeToEnum ::v(),
TensorShape{static_cast (data_set.x().size())/ 3, 3});
copy_n(data_set.x().begin(), data_set.x().size(),
x_data.flat ().data());
Tensory_data(DataTypeToEnum ::v(),
TensorShape{static_cast (data_set.y().size()), 1});
copy_n(data_set.y().begin(), data_set.y().size(),
y_data.flat ().data());

要定義一個張量，我們需要知道它的類型和形狀。在 data_set 對象中，x 數據以向量的方式保存，所以我們將尺寸縮減爲 3（每個保存三個特徵）。隨後我們使用 std::copy_n 來從 data_set 對象中複製數據到 Tensor（一個 Eigen::TensorMap）的底層數據結構中。現在，我們有了數據和 TensorFlow 數據結構，是時候構建模型了。

你可以輕易地調試一個張量：

LOG(INFO) << x_data.DebugString();

C ++ API 的獨特之處在於，您需要一個 Scope 對象來保持構建靜態計算圖的狀態，並將該對象傳遞給每個操作。

Scopescope = Scope::NewRootScope();

我們需要兩個佔位符，x 包含特徵，y 代表每輛車相應的價格。

auto x = Placeholder(scope, DT_FLOAT);
auto y = Placeholder(scope, DT_FLOAT);

我們的網絡有兩個隱藏層，因此我們會有三個權重矩陣和三個偏置項向量。在 Python 中，它是由底層直接完成的，在 C++ 中你必須定義一個變量，隨後定義一個 Assign 節點以爲該變量分配一個默認值。我們使用 RandomNormal 來初始化我們的變量，這會給我們一個服從正態分佈的隨機值。

// weights init
auto w1 = Variable(scope, {3, 3}, DT_FLOAT);
auto assign_w1 = Assign(scope, w1, RandomNormal(scope, {3, 3}, DT_FLOAT));
auto w2 = Variable(scope, {3, 2}, DT_FLOAT);
auto assign_w2 = Assign(scope, w2, RandomNormal(scope, {3, 2}, DT_FLOAT));
auto w3 = Variable(scope, {2, 1}, DT_FLOAT);
auto assign_w3 = Assign(scope, w3, RandomNormal(scope, {2, 1}, DT_FLOAT));
// bias init
auto b1 = Variable(scope, {1, 3}, DT_FLOAT);
auto assign_b1 = Assign(scope, b1, RandomNormal(scope, {1, 3}, DT_FLOAT));
auto b2 = Variable(scope, {1, 2}, DT_FLOAT);
auto assign_b2 = Assign(scope, b2, RandomNormal(scope, {1, 2}, DT_FLOAT));
auto b3 = Variable(scope, {1, 1}, DT_FLOAT);
auto assign_b3 = Assign(scope, b3, RandomNormal(scope, {1, 1}, DT_FLOAT));

隨後我們使用 Tanh 作爲激活函數來構建三個層。

// layers
auto layer_1 = Tanh(scope, Add(scope, MatMul(scope, x, w1), b1));
auto layer_2 = Tanh(scope, Add(scope, MatMul(scope, layer_1, w2), b2));
auto layer_3 = Tanh(scope, Add(scope, MatMul(scope, layer_2, w3), b3));

加入 L2 正則化。

// regularization
auto regularization = AddN(scope,
initializer_list<Input>{L2Loss(scope, w1),
L2Loss(scope, w2),
L2Loss(scope, w3)});

最後計算損失函數，即計算預測價格和實際價格 y 之間的差異，並添加正則化到損失函數中。

// loss calculation
auto loss = Add(scope,
ReduceMean(scope, Square(scope, Sub(scope, layer_3, y)), {0, 1}),
Mul(scope, Cast(scope, 0.01, DT_FLOAT), regularization));

在這裏，我們完成了前向傳播，現在該進行反向傳播了。第一步是調用函數以在前向傳播操作的計算圖中加入梯度運算。

// add the gradients operations to the graph
std::vector<Output> grad_outputs;
TF_CHECK_OK(AddSymbolicGradients(scope, {loss}, {w1, w2, w3, b1, b2, b3}, &grad_outputs));

所有的運算都需要計算損失函數對每一個變量的導數並添加到計算圖中，我們初始化 grad_outputs 爲一個空向量，它在 TensorFlow 會話打開時會將梯度傳入節點，grad_outputs[0] 會提供損失函數對 w1 的導數，grad_outputs[1] 提供損失函數對 w2 的導數，這一過程會根據 {w1, w2, w3, b1,b2, b3} 的順序，也是變量被傳遞到 AddSymbolicGradients 的順序進行。

現在我們在 grad_outputs 有一系列節點，當在 TensorFlow 會話中使用時，每個節點計算損失函數對一個變量的梯度。我們需要使用它來更新變量。所以，我們在每行放一個變量，使用梯度下降這個最簡單的方法來更新。

// update the weights andbias using gradient descent
auto apply_w1 = ApplyGradientDescent(scope, w1, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[0]});
auto apply_w2 = ApplyGradientDescent(scope, w2, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[1]});
auto apply_w3 = ApplyGradientDescent(scope, w3, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[2]});
auto apply_b1 = ApplyGradientDescent(scope, b1, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[3]});
auto apply_b2 = ApplyGradientDescent(scope, b2, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[4]});
auto apply_b3 = ApplyGradientDescent(scope, b3, Cast(scope, 0.01, DT_FLOAT), {grad_outputs[5]});

Cast 操作實際上是學習速率的參數，在這裏是 0.01。

我們神經網絡的計算圖已經構建完畢，現在可以打開一個會話並運行該計算圖。基於 Python 的 Optimizers API 基本封裝了計算和應用過程中的損失函數最小化方法。當 Optimizer API 可以接入 C++ 時我們就可以在這裏使用它了。

我們初始化一個以 ClientSession 和一個以 Tensor 命名的輸出向量，用來接收網絡的輸出。

ClientSessionsession(scope);
std::vector<Tensor> outputs;

隨後在 Python 中調用 tf.global_variables_initializer() 就可以初始化變量，因爲在構建計算圖時，所有變量的列表都是保留的。在 C++中，我們必須列出變量。每個 RandomNormal 輸出會分配給 Assign 節點中定義的變量。

// init the weights andbiases by running the assigns nodes once
TF_CHECK_OK(session.Run({assign_w1, assign_w2, assign_w3, assign_b1, assign_b2, assign_b3}, nullptr));

在這一點上，我們可以在訓練數量內循環地更新參數，在我們的例子中是 5000 步。第一步是使用 loss 節點運行前向傳播部分，輸出是網絡的損失。每 100 步我們都會記錄一次損失值，損失的減少是網絡成功運行的標誌。隨後我們必須計算梯度節點並更新變量。我們的梯度節點是 ApplyGradientDescent 節點的輸入，所以運行 apply_nodes 會首先計算梯度，隨後將其應用到正確的變量上。

// training steps
for(int i = 0; i < 5000; ++i) {
TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {loss}, &outputs));
if(i % 100== 0) {
std::cout << "Loss after "<< i << " steps "<< outputs[0].scalar () << std::endl;
}
// nullptr because the output fromthe run isuseless
TF_CHECK_OK(session.Run({{x, x_data}, {y, y_data}}, {apply_w1, apply_w2, apply_w3, apply_b1, apply_b2, apply_b3, layer_3}, nullptr));
}

在網絡訓練到這種程度後，我們可以嘗試預測汽車的價格了——進行推斷。讓我們來嘗試預測一輛車齡爲 7 年，里程 11 萬公里，柴油發動機的寶馬 1 系轎車。爲了這樣做我們需要運行 layer_3 節點，將汽車的數據輸入 x，這是一個前向傳播的步驟。因爲我們之前運行了 5000 步的訓練，權重已經得到了學習，所以輸出的結果將不是隨機的。

我們不能直接使用汽車的屬性，因爲我們的神經網絡是從歸一化屬性中學習的，所以數據必須經過同樣的歸一化過程。DataSet 類有一個 input 方法在 CSV 讀取器件處理數據集中的元數據。

// prediction using the trained neural net
TF_CHECK_OK(session.Run({{x, {data_set.input(110000.f, Fuel::DIESEL, 7.f)}}}, {layer_3}, &outputs));
cout << "DNN output: "<< *outputs[0].scalar ().data() << endl;
std::cout << "Price predicted "<< data_set.output(*outputs[0].scalar ().data()) << " euros"<< std::endl;

網絡的輸出值在 0 到 1 之間，data_set 的 output 方法還負責將數值從元數據轉換回人類可讀的數字。模型可以使用 bazel run -c opt //tensorflow/cc/models:model 命令來運行，如果 TensorFlow 剛剛被編譯，你可以看到這樣形式的輸出：

Lossafter 0steps 0.317394
Lossafter 100steps 0.0503757
Lossafter 200steps 0.0487724
Lossafter 300steps 0.047366
Lossafter 400steps 0.0460944
Lossafter 500steps 0.0449263
Lossafter 600steps 0.0438395
Lossafter 700steps 0.0428183
Lossafter 800steps 0.041851
Lossafter 900steps 0.040929
Lossafter 1000steps 0.0400459
Lossafter 1100steps 0.0391964
Lossafter 1200steps 0.0383768
Lossafter 1300steps 0.0375839
Lossafter 1400steps 0.0368152
Lossafter 1500steps 0.0360687
Lossafter 1600steps 0.0353427
Lossafter 1700steps 0.0346358
Lossafter 1800steps 0.0339468
Lossafter 1900steps 0.0332748
Lossafter 2000steps 0.0326189
Lossafter 2100steps 0.0319783
Lossafter 2200steps 0.0313524
Lossafter 2300steps 0.0307407
Lossafter 2400steps 0.0301426
Lossafter 2500steps 0.0295577
Lossafter 2600steps 0.0289855
Lossafter 2700steps 0.0284258
Lossafter 2800steps 0.0278781
Lossafter 2900steps 0.0273422
Lossafter 3000steps 0.0268178
Lossafter 3100steps 0.0263046
Lossafter 3200steps 0.0258023
Lossafter 3300steps 0.0253108
Lossafter 3400steps 0.0248298
Lossafter 3500steps 0.0243591
Lossafter 3600steps 0.0238985
Lossafter 3700steps 0.0234478
Lossafter 3800steps 0.0230068
Lossafter 3900steps 0.0225755
Lossafter 4000steps 0.0221534
Lossafter 4100steps 0.0217407
Lossafter 4200steps 0.0213369
Lossafter 4300steps 0.0209421
Lossafter 4400steps 0.020556
Lossafter 4500steps 0.0201784
Lossafter 4600steps 0.0198093
Lossafter 4700steps 0.0194484
Lossafter 4800steps 0.0190956
Lossafter 4900steps 0.0187508
DNN output: 0.0969611
Pricepredicted 13377.7euros

這裏的預測車價是 13377.7 歐元。每次預測的到的車價都不相同，甚至會介於 8000-17000 之間。這是因爲我們只使用了三個屬性來描述汽車，而我們的的模型架構也相對比較簡單。

正如之前所說的，C++ API 的開發仍在進行中，我們希望在不久的將來，更多的功能可以加入進來。

原文鏈接：https://matrices.io/training-a-deep-neural-network-using-only-tensorflow-c/

文章來源：機器之心

喜歡這篇文章嗎？快分享吧！