Is there any instructions sets support MIMD arch?


I have already known SIMD instructions sets contains SSE1 to SSE5.<br /> But not found too much talk about any instruction sets support MIMD arch.<br /> In c++ code , we can use intrinsic to write "SIMD running" code.<br /> Is there any way to write "MIMD running" code ?<br /> If MIMD is more powerful than SIMD, it is better to write c++ code support MIMD.<br /> Is my thought correct ?


The Wikipedia page <a href="https://en.wikipedia.org/wiki/Flynn%27s_taxonomy#Multiple_instruction_streams,_multiple_data_streams_(MIMD)" rel="nofollow">Flynn's taxonomy</a> describes MIMD as:


Multiple autonomous processors simultaneously executing different instructions on different data. MIMD architectures include multi-core superscalar processors, and distributed systems, using either one shared memory space or a distributed memory space.


Any time you divide an algorithm (such as into threads using OpenMP, for example), you may be using MIMD. Generally, you don't need a special "MIMD instruction set" - the ISA is the same as for SISD, as each instruction stream operates independently of the others, on its own data. EPIC (explicitly parallel instruction computing) is an alternative approach where the functional units operate in lockstep, but with independent(ish) instructions and data.

As to which is "more powerful" (or more energy-efficient, or lowest latency, or whatever matters in your use case), there's no single answer. As with many complex issues, "it depends".



Is my thought correct ?


It is certainly naive, and implementation specific. Remember the following facts:


<a href="https://en.wikipedia.org/wiki/Optimizing_compiler" rel="nofollow">optimizing compilers</a> generate very clever code (when you <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html" rel="nofollow">enable</a> optimizations). Try for example some recent <a href="http://gcc.gnu.org/" rel="nofollow">GCC</a> invoked as g++ -march=native -O3 -Wall (and perhaps also -fverbose-asm -S if you want to look into the generated assembler code); see CppCon 2017: Matt Godbolt's <a href="https://youtu.be/bSkpMdDe4g4" rel="nofollow">talk</a> <em>“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”</em>

</li> <li>

there are some extensions (done thru standardized pragmas) to improve optimizations for MIMD, look into <a href="https://en.wikipedia.org/wiki/OpenMP" rel="nofollow">OpenMP</a>, <a href="https://en.wikipedia.org/wiki/OpenACC" rel="nofollow">OpenACC</a>.

</li> <li>

consider explicit parallelization approaches: multi-threading (read some <a href="https://computing.llnl.gov/tutorials/pthreads/" rel="nofollow">pthread programming</a> tutorial), <a href="https://en.wikipedia.org/wiki/Message_Passing_Interface" rel="nofollow">MPI</a>...

</li> <li>

look also into dialects for GPGPU computing like <a href="https://en.wikipedia.org/wiki/OpenCL" rel="nofollow">OpenCL</a> & <a href="https://en.wikipedia.org/wiki/CUDA" rel="nofollow">CUDA</a>.

</li> </ul>

See also <a href="https://stackoverflow.com/a/47528068/841108" rel="nofollow">this answer</a> to a related question.


If MIMD is more powerful than SIMD, it is better to write c++ code support MIMD.


Certainly not always, if you just care about performance. As usual, it depends, and you need to benchmark.


  • Extracting Ubuntu Sensors Command Using Scripts
  • What would be a good example of an endofunctor that is not the identity functor?
  • SharePoint - Claims Based Authentication - New user use-case
  • Crash in program using OpenMP, x64 only
  • Pandas: select rows where two columns are different
  • How to tell openmp not to synchronize an array
  • For-loop inside parallel region
  • Where is the Visual C++ Update 2 Runtime
  • how to convert a unix timestamp into nsdate in iphone [duplicate]
  • interpolation in 3d computer graphics
  • Sql indexes vs full table scan
  • Class implementation in a header file == bad style? [duplicate]
  • How to use arithmetic operators with SAS macro variables [duplicate]
  • Retrieving a double from a JTextArea while solving for X
  • Get all existing pointers to an object
  • How can I count unique terms in a plaintext file case-insensitively?
  • In Akka, is ActorContext thread safe?
  • Converting simple MySQL database to a NoSQL solution
  • Does the Azure table storage API cache results?
  • Distributed JMS based logging .. falling flat?
  • Creating a C++ function that calls other Lua function
  • Regex for nested values
  • How gzip file gets stored in HDFS
  • Why isn't my “Fizz Buzz” test in R working?
  • Access user's phone number on iOS 7
  • Validate jQuery plugin, field not required
  • Overlapping controls in Windows XP
  • Ensure fsync did its job
  • Time complexity of a program which involves multiple variables
  • one Local Olampyad Questions on Informatic in 2011
  • Record samples being played with OpenAL
  • Projection media query: browser support and workarounds?
  • Spray.io: When (not) to use non-blocking route handling?
  • When should I choose bucket sort over other sorting algorithms?
  • Large data - storage and query
  • Transpose CSV data with awk (pivot transformation)
  • How to set the response of a form post action to a iframe source?
  • Change div Background jquery
  • Qt: Run a script BEFORE make
  • reshape alternating columns in less time and using less memory