
Question:
I am trying to find the bottleneck in my OpenCL kernel, is it possible to profile OpenCL programms on mac os X? I found gDebugger on <a href="http://www.gremedy.com/" rel="nofollow">http://www.gremedy.com/</a>, but it requires 10.5 or 10.6 to run. AMD SDK supports only Linux and Windows.
Is there a profiler for Mountain Lion?
Answer1:How detailed must your profiling information be? Is it okay to use the built-in internal profiler?<br /> OpenCL queues can be created with the CL_QUEUE_PROFILING_ENABLE flag.
This way you can see for each kernel you executed:<br /> When it has been
<ul><li>Enqueued</li> <li>Submitted to you OCL-Device</li> <li>Started</li> <li>Ended</li> </ul>With <strong>C++-Bindings</strong>, the creation of the queue can look like this:
<pre class="lang-cpp prettyprint-override">_queue = new cl::CommandQueue(_context, _device, CL_QUEUE_PROFILING_ENABLE );
The extration of the profiling information looks like this:
1) Save the event object (in an array) delivered by the enqueued kernel you want to profile.
<pre class="lang-cpp prettyprint-override">cl::Event evt;
_queue->enqueueNDRangeKernel( _kernel, cl::NullRange, _range, cl::NullRange, NULL, &evt);
2) After execution of the queue, extract the profiling information
<pre class="lang-cpp prettyprint-override">std::vector<cl::Event> evts;
//add all events to this vector here
//cl::Event evt;
//_queue->enqueueNDRangeKernel( _kernel, cl::NullRange, _range, cl::NullRange, NULL, &evt);
//evts.push_back(evt);
uint64_t param;
for (unsigned int i=0; i<evts.size(); i++)
{
evts[i].getProfilingInfo(CL_PROFILING_COMMAND_QUEUED, ¶m);
printf("%u: %llu", i, param);
evts[i].getProfilingInfo(CL_PROFILING_COMMAND_SUBMIT, ¶m);
printf(" %llu", param);
evts[i].getProfilingInfo(CL_PROFILING_COMMAND_START, ¶m);
printf(" %llu", param);
evts[i].getProfilingInfo(CL_PROFILING_COMMAND_END, ¶m);
printf(" %llu\n", param);
}