使用Titan RTX服務器重現Fast.ai/DIUx imagenet18

2019-01-31 07:22:23.0

最後，作為DAWN基準測試的一部分，Fast.ai贏得了第一個ImageNet培訓成本挑戰。他們的定制ResNet50 使用AWS p3.16xlarge（8 x V100 GPU）需要3.27小時才能達到93％的前5精度。今年，Fast.ai與DIUx合作，用16台p3.16xlarge機器將訓練時間減少到18分鐘。這是宣佈時（2018年9月）最快的解決方案。

在本博客中，我們使用單個8圖靈GPU（Titan RTX）服務器重現最新的Fast.ai/DIUx的ImageNet結果。它需要2.36小時才能達到93％的前5精度。

以下是統計數據的詳細信息：

Epoch	Training Time (hour)	Top-1 Acc	Top-5 Acc
1	0.0539	7.2800	19.1979
2	0.0921	18.3619	39.6699
3	0.1306	26.1700	50.2779
4	0.1691	29.9260	54.5460
5	0.2078	32.3339	58.0260
6	0.2465	27.2560	50.7680
7	0.2852	30.0799	54.3160
8	0.3240	39.1959	65.4260
9	0.3627	42.8860	69.2040
10	0.4014	45.1940	70.9100
11	0.4402	49.3839	74.9639
12	0.4788	54.9459	79.5660
13	0.5174	58.5820	81.8980
14	0.6433	57.3959	81.5960
15	0.7569	53.0480	77.7799
16	0.8703	58.9599	82.7979
17	0.9845	60.4039	83.8259
18	1.0982	62.3779	84.8720
19	1.2124	64.9540	86.6080
20	1.3258	65.9520	87.2919
21	1.4390	68.3700	88.7060
22	1.5529	71.4420	90.4820
23	1.6673	72.0479	90.6679
24	1.7826	72.8679	91.1559
25	1.8957	73.5739	91.4960
26	2.1455	75.8519	92.9879
27	2.3657	75.9179	93.0179

You can jump to the code and the instructions from here.

筆記
傑里米霍華德寫了一篇關於他們的方法背後的技術細節的精彩博客。我們的要點是：
使用動態大小圖像進行推理：來自最新Fast.ai/DIUx條目的關鍵思想是減少推理預處理階段的信息丟失。讓我們首先看一下運行推理圖像分類的常見做法：由於使用完全連接的圖層，圖像首先在網絡處理之前轉換為固定大小。這通常涉及透視失真（通過圖像調整大小）或丟失圖像內容的重要部分（通過裁剪）。在第二種情況下，人們經常使用多種隨機作物來提高準確性。
很明顯，這種預處理將對網絡性能產生很大影響：網絡需要足夠強大以識別“扭曲”或“裁剪”對象，這通常需要更多的訓練時代。
Fast.ai/DIUx團隊的關鍵觀察是通過擺脫修復大小限制來刪除不必要的預處理。這是通過全局池層替換完全連接的層來實現的，全局池層對輸入的大小沒有限制。現在推理變成了一個“更容易”的任務，因為它處理的輸入圖像沒有嚴重扭曲也沒有被裁剪。因此，需要較少的訓練時期才能達到一定的測試精度。根據作者的說法，這樣做可以使訓練時間縮短23％，達到目標準確度。
漸進式訓練：Fast.ai/DIUx團隊採用的另一項有趣技術是使用多種分辨率的圖像進行漸進式訓練。培訓以低分辨率（128 x 128）開始，用於輸入圖像和更大的批量大小，以快速達到一定的準確度; 然後它增加了分辨率（首先是244 x 244，然後是288 x 288），用於昂貴的微調。這允許總體上更少的時期以實現目標測試準確度。請注意，這只能通過全局池層替換完全連接的層來實現，因此使用低分辨率圖像訓練的網絡可以使用更高分辨率的圖像而無需修改。與此同時，為每個分辨率仔細安排批量大小和學習率，以盡快獲得所需的性能。

結論
在這篇文章中，我們在單個圖靈GPU服務器上重現了當前最先進的ImageNet訓練性能。令人興奮的是，只需要2.4小時即可達到93％的Top-5精度。下次我們將在本地網絡中以多節點分佈式方式重現培訓。

Demo

You can reproduce the results with this repo.

First, clone the repo and setup a Python 3 virtual environment:

git clone https://github.com/lambdal/imagenet18.gitcd imagenet18

virtualenv -p python3 envsource env/bin/activate

pip install -r requirements_local.txt

Then download the data to your local machine (be aware that the tar files are about 200 GB in total):

wget https://s3.amazonaws.com/yaroslavvb/imagenet-data-sorted.tar

wget https://s3.amazonaws.com/yaroslavvb/imagenet-sz.tar

tar -xvf imagenet-data-sorted.tar -C /mnt/data/data

tar -xvf imagenet-sz.tar -C /mnt/data/data

Finally run the following command to reproduce the results on a 8-GPU server. Set the "nproc_per_node" to match the number of GPUs on your machine.

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 \
training/train_imagenet_nv.py /mnt/data/data/imagenet \
--fp16 --logdir ./ncluster/runs/lambda-blade --distributed --init-bn0 --no-bn-wd \
--phases "[{'ep': 0, 'sz': 128, 'bs': 512, 'trndir': '-sz/160'}, {'ep': (0, 7), 'lr': (1.0, 2.0)}, {'ep': (7, 13), 'lr': (2.0, 0.25)}, {'ep': 13, 'sz': 224, 'bs': 224, 'trndir': '-sz/320', 'min_scale': 0.087}, {'ep': (13, 22), 'lr': (0.4375, 0.043750000000000004)}, {'ep': (22, 25), 'lr': (0.043750000000000004, 0.004375)}, {'ep': 25, 'sz': 288, 'bs': 128, 'min_scale': 0.5, 'rect_val': True}, {'ep': (25, 28), 'lr': (0.0025, 0.00025)}]"

To print out the statics, locate the events.out file in the "logdir" folder and simply run this command:

python dawn/prepare_dawn_tsv.py \ --events_path=<logdir>/<events.out>

文章來源：lambdalabs.com