Problem

前回はdlibをCUDAを有効にしたバイナリ、無効にしたバイナリを生成しました。

具体的に、CUDAを有効にしている状態で、どれだけ性能差が出るのかを比較したいと思います。

Preparation

まずは、実験に使う画像を用意します。
今回は下記を用意しました。

元画像: https://upload.wikimedia.org/wikipedia/commons/b/b7/G7_summit_at_Shimakan.jpg
4368x2912の大きなサイズです。

次に計測を行うソースです。
examples\face_detection_ex.cppという顔検出のサンプルがあります。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/gui_widgets.h>
#include <dlib/image_io.h>
#include <iostream>

using namespace dlib;
using namespace std;

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv)
{
try
{
if (argc == 1)
{
cout << "Give some image files as arguments to this program." << endl;
return 0;
}

frontal_face_detector detector = get_frontal_face_detector();
image_window win;

// Loop over all the images provided on the command line.
for (int i = 1; i < argc; ++i)
{
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
pyramid_up(img);

// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img);

cout << "Number of faces detected: " << dets.size() << endl;
// Now we show the image on the screen and the face detections as
// red overlay boxes.
win.clear_overlay();
win.set_image(img);
win.add_overlay(dets, rgb_pixel(255,0,0));

cout << "Hit enter to process the next image..." << endl;
cin.get();
}
}
catch (exception& e)
{
cout << "\nexception thrown!" << endl;
cout << e.what() << endl;
}
}

これに対して、上述の画像を認識させると

こうなります。

Try

このソースを下記のように改造します。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/gui_widgets.h>
#include <dlib/image_io.h>
#include <iostream>
#include <chrono>

using namespace dlib;
using namespace std;

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv)
{
try
{
if (argc == 1)
{
cout << "Give some image files as arguments to this program." << endl;
return 0;
}

frontal_face_detector detector = get_frontal_face_detector();
image_window win;

std::chrono::system_clock::time_point start, end;
start = std::chrono::system_clock::now();

// Loop over all the images provided on the command line.
for (int c = 0; c < 100; c++)
for (int i = 1; i < argc; ++i)
{
cout << "processing image " << argv[i] << endl;
array2d<unsigned char> img;
load_image(img, argv[i]);
// Make the image bigger by a factor of two. This is useful since
// the face detector looks for faces that are about 80 by 80 pixels
// or larger. Therefore, if you want to find faces that are smaller
// than that then you need to upsample the image as we do here by
// calling pyramid_up(). So this will allow it to detect faces that
// are at least 40 by 40 pixels in size. We could call pyramid_up()
// again to find even smaller faces, but note that every time we
// upsample the image we make the detector run slower since it must
// process a larger image.
pyramid_up(img);

// Now tell the face detector to give us a list of bounding boxes
// around all the faces it can find in the image.
std::vector<rectangle> dets = detector(img);

//cout << "Number of faces detected: " << dets.size() << endl;
// Now we show the image on the screen and the face detections as
// red overlay boxes.
//win.clear_overlay();
//win.set_image(img);
//win.add_overlay(dets, rgb_pixel(255,0,0));

//cout << "Hit enter to process the next image..." << endl;
//cin.get();
}

end = std::chrono::system_clock::now();
double elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count(); //ミリ秒に変換

std::cout << std::fixed;
cout << "Total: " << std::setprecision(4) << elapsed << endl;
cout << "Average: " << std::setprecision(4) << elapsed / 100 << endl;
}
catch (exception& e)
{
cout << "\nexception thrown!" << endl;
cout << e.what() << endl;
}
}

要するに、入力した画像ファイルを100回認識させ、その処理に要した合計時間と平均時間を算出するようにしただけです。

下記がその結果です。

Left align Total(ms) Average(ms)
w/o CUDA 1007778.0000 10077.7800
w/ CUDA 997522.0000 9975.2200

正直微妙です。約10%しか差がありません。