Second MPI_Send is hanging if buffer size is over 256

int n, j, i, i2, i3, rank, size, rowChunk, **cells, **cellChunk; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if(!rank){ printf("\nEnter board size:\n"); fflush(stdout); scanf("%d", &n); printf("\nEnter the total iterations to play:\n"); fflush(stdout); scanf("%d", &j); srand(3); rowChunk = n/size; //how many rows each process will get for(i=1; i<size; i++){ MPI_Send(&n,1, MPI_INT, i, 0, MPI_COMM_WORLD); MPI_Send(&j,1, MPI_INT, i, 7, MPI_COMM_WORLD); } cells = (int**) malloc(n*sizeof(int*)); //create main 2D array for(i=0; i<n; i++){ cells[i] = (int*) malloc(n*sizeof(int)); } for(i=0; i<n; i++){ for(i2=0; i2<n; i2++){ //fill array with random data cells[i][i2] = rand() % 2; } } for(i=1; i<size; i++){ //send blocks of rows to each process for(i2=0; i2<rowChunk; i2++){ //this works for all n MPI_Send(cells[i2+(rowChunk*i)], n, MPI_INT, i, i2, MPI_COMM_WORLD); } } cellChunk = (int**) malloc(rowChunk*sizeof(int*)); for(i=0; i<rowChunk; i++){ //declare 2D array for process zero's array chunk cellChunk[i] = (int*) malloc(n*sizeof(int)); } for(i=0; i<rowChunk; i++){ //give process zero it's proper chunk of the array for(i2=0; i2<n; i2++){ cellChunk[i][i2] = cells[i][i2]; } } for(i3=1; i3<=j; i3++){ MPI_Send(cellChunk[0], n, MPI_INT, size-1,1,MPI_COMM_WORLD); //Hangs here if n >256 MPI_Send(cellChunk[rowChunk-1], n, MPI_INT, 1,2,MPI_COMM_WORLD); //also hangs if n > 256 ... //Leaving out code that works

This code works perfectly if n (array size) is less than or equal to 256. Any greater, and it hangs on the first MPI_Send. Also, when sending out the array row chunks to the other processes, (first MPI_Send) the other processes receive their data perfectly, even though n > 256. What would cause just this MPI_Send to hang if the buffer size is over 256?


You are never receiving any messages, and so the code will fill the local MPI buffer space and then deadlock waiting for an MPI_Recv (or similar) call to be run. You will need to insert receive operations so that your messages will actually be sent and processed on the receivers.


MPI_Send is a blocking call. The standard mandates that MPI_Send can return control as early as the message buffer can be safely modified. Alternatively, the MPI_Send can wait to return until some time AFTER the MPI_Recv has started or completed.

The implementation of MPI you are using is likely doing an "eager" message progression if the message is <256 count (With an MPI_INT datatype, this would be a 1k message). The message is copied into another buffer and control is returned "early." For large(r) messages, the MPI_Send call does not return until (at least) the matching MPI_Recv call is executed.

If you post a complete reproducer, you will likely get a better answer.


MPI_Send "may block until the message is received.", so it is most likely that the matching receive is not reached. You need to make sure the MPI_Recvs are placed in the right order. Since you did not post your receive part, it is impossible to tell the details.

You could restructure your application, to make sure the matching receives are in order. It also might be convenient for you to use the combined MPI_Sendrecv or the nonblocking MPI_Isend, MPI_Irecv and MPI_Wait.


  • Split an image into 64x64 chunks
  • Spacing/Leading PdfPCell's elements
  • How do I stop js files being cached in IE?
  • How to generate random events in android?
  • Yii: any way to save the images in compressed form?
  • OSStatus error -50 (invalid parameters) AudioQueueNewInput recording audio on iOS
  • File random access in J2ME
  • How to generate an asynchronous reset verilog always blocks with chisel
  • How to work with AMMediaType for video filters
  • mapping between two ontologies
  • Python function to read variable length blocks of data from file while open
  • Serve file to user over http via php
  • Inversing an interpolation of rotation
  • Need code translation from VB to C#
  • Best practice to eliminate magic numbers within a member function
  • How to unpack 32bit integer packed in a QByteArray?
  • How to synchronize jQuery dialog box to act like alert() of Javascript
  • Eliminate partial duplicate rows from result set
  • Linq Objects Group By & Sum
  • C# - Is there a limit to the size of an httpWebRequest stream?
  • Why is the size of this struct 32?
  • Optimizing database types to compact database (SQLite)
  • How to recover from a Spring Social ExpiredAuthorizationException
  • Updating server-side rendering client-side
  • ILMerge & Keep Assembly Name
  • Knitr HTML Loop - Some HTML output, some R output
  • Large data - storage and query
  • WOWZA + RTMP + HTML5 Playback?
  • How to pass list parameters for each object using Spring MVC?
  • Comma separated Values
  • Buffer size for converting unsigned long to string
  • Error creating VM instance in Google Compute Engine
  • Hits per day in Google Big Query
  • how does django model after text[] in postgresql [duplicate]
  • embed rChart in Markdown
  • sending mail using smtp is too slow
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • reshape alternating columns in less time and using less memory
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app