88760

Is there an opencl profiler for mac os X 10.8?

Question:

I am trying to find the bottleneck in my OpenCL kernel, is it possible to profile OpenCL programms on mac os X? I found gDebugger on <a href="http://www.gremedy.com/" rel="nofollow">http://www.gremedy.com/</a>, but it requires 10.5 or 10.6 to run. AMD SDK supports only Linux and Windows.

Is there a profiler for Mountain Lion?

Answer1:

How detailed must your profiling information be? Is it okay to use the built-in internal profiler?<br /> OpenCL queues can be created with the CL_QUEUE_PROFILING_ENABLE flag.

This way you can see for each kernel you executed:<br /> When it has been

<ul><li>Enqueued</li> <li>Submitted to you OCL-Device</li> <li>Started</li> <li>Ended</li> </ul>

With <strong>C++-Bindings</strong>, the creation of the queue can look like this:

<pre class="lang-cpp prettyprint-override">_queue = new cl::CommandQueue(_context, _device, CL_QUEUE_PROFILING_ENABLE );

The extration of the profiling information looks like this:

1) Save the event object (in an array) delivered by the enqueued kernel you want to profile.

<pre class="lang-cpp prettyprint-override">cl::Event evt; _queue->enqueueNDRangeKernel( _kernel, cl::NullRange, _range, cl::NullRange, NULL, &evt);

2) After execution of the queue, extract the profiling information

<pre class="lang-cpp prettyprint-override">std::vector<cl::Event> evts; //add all events to this vector here //cl::Event evt; //_queue->enqueueNDRangeKernel( _kernel, cl::NullRange, _range, cl::NullRange, NULL, &evt); //evts.push_back(evt); uint64_t param; for (unsigned int i=0; i<evts.size(); i++) { evts[i].getProfilingInfo(CL_PROFILING_COMMAND_QUEUED, &param); printf("%u: %llu", i, param); evts[i].getProfilingInfo(CL_PROFILING_COMMAND_SUBMIT, &param); printf(" %llu", param); evts[i].getProfilingInfo(CL_PROFILING_COMMAND_START, &param); printf(" %llu", param); evts[i].getProfilingInfo(CL_PROFILING_COMMAND_END, &param); printf(" %llu\n", param); }

Recommend

  • How can you run profiling on a Java application wrapped as an .exe?
  • Using POI or Tika to extract text, stream-to-stream without loading the entire file in memory
  • Pandas mask / where methods versus NumPy np.where
  • Entitlements are not valid: Error while installing Ad Hoc build on Device
  • Profiling timer expired when using gperftools with sort
  • Concrete class implementing interface with lower access
  • App crashes only in xcode instruments
  • MiniProfiler with multilayered WCF services
  • RabbitMq and “Fatal error: handshake failure - handshake_decode_error”
  • Parse Google Maps Geocode API Using Json.Net
  • Questions about possible java(or other memory managed language) optimizations
  • How to send correlation id, into message, from sender and retrieval from receive into message header
  • mapping joda timezone to windows timezone (for example in C#)
  • c++ / Qt - computation time
  • EventBus on Android: how to implement dynamic queues vs. class-based event subscription?
  • Infinite loop in Doctrine event listener when trying to save additional entity
  • Building a swift dictionary with a single array element for values causing cpu to die
  • Matlab Codegen Eig Function - Is this a Bug?
  • MSMQ on Azure Website
  • Consistent Client Side Date/timestamp using JavaScript(considering TimeZones)
  • Easy way to profile Magento API calls
  • Why can Windsor only intercept virtual or interfaced methods?
  • VisualVM profiling hangs while instrumenting classes
  • NSOpenPanel's setDirectoryURL doesn't work on Lion
  • jquery validation - waiting for remote check to complete
  • aapt.exe'' finished with non-zero exit value 1
  • How do I display a dialog that asks the user multi-choice questıon using tkInter?
  • Android app gives error “BatteryStatsImpl: reading network stats”
  • Not able to aggregate on nested fields in elasticsearch
  • How to run “Deployd” on port 80 instead of port 5000 in webserver.
  • SignalR .NET Client Invoke throws an exception
  • Incrementing object id automatically JS constructor (static method and variable)
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Web-crawler for facebook in python
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • Proper folder structure for lots of source files
  • Free memory of cv::Mat loaded using FileStorage API
  • unknown Exception android
  • Net Present Value in Excel for Grouped Recurring CF