Really, I just want to see how it works. I don't want to fuss with hardware right now. So, the fact that I have a 24-core embedded GPU that's part of my Broadwell cpu is convenient. Running Debian, I was able to easily download the requisite packages and get started. I did quickly discover that consumer Intel hardware, at present, does *not* support
doubleanything, making this a less-than-ideal test bed for scientific programming.
The next question is whether I've gained anything. How do I know it's actually working? My test case was inspired by a performance issue that I ran into in my work. In a scientific simulation program that I use & help develop, Valgrind revealed that
pow(double, double)was taking fully half of the total computational time. I poked around a bit, and realized that
logreally are rather complex operations, particularly for doubles. With this in mind, I set up a simple example using both OpenCl and straight C++, and compared the timings.
The C++ is clean and easy to read, but is ~20x slower than the OpenCL version. Worried about the possibility of "unintended optimizations", I tried using a different kernel function. The speed results remained, but this revealed that precision-sensitive operations apparently differ between OpenCL and stdc++. So, this is something to keep an eye on.
EDIT: I added an example using Boost.Compute today, which brings the best of both worlds. Boost.Compute has straighforward docs, and includes a nice closure macro that allows the direct incorporation of C++ code in kernel functions. The resulting code is *way* less verbose than vanilla OpenCL. The only downside is the extra requirement.
Here's a full example as a github repo (boost not shown, timings comparable with OpenCL example):
$make; time ./opencl; time ./straight g++ -Wall -std=c++11 -lOpenCL -o opencl opencl.cpp g++ -Wall -std=c++11 -o straight straight.cpp Using platform: Intel Gen OCL Driver Using device: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 result: 200 201 202 203 204 205 206 207 208 209 99990 99991 99992 99993 99994 99995 99996 99997 99998 99999 ./opencl 0.20s user 0.09s system 97% cpu 0.295 total result: 200 201 202 203 204 205 206 207 208 209 99990 99991 99992 99993 99994 99995 99996 99997 99998 99999 ./straight 6.03s user 0.00s system 99% cpu 6.032 total