66842

How to bind Public IP to spark nodes in Amazon EC2?

Question:

I am trying to create a Spark cluster between two instances in two different regions. As they are not in same VPC/security group, I am having trouble to connect Master from one region to Slave from another region (and vice versa). So far I have done the following:

<ol><li>

Edited /etc/hosts file to add public IP of both Master and Slaves

54.208.204.190 master 13.113.105.113 slave01

</li> <li>

Added slave01 to $SPARK_HOME/conf/slaves file

</li> <li>

In $SPARK_HOME/conf/spark-env.sh added the following:

export JAVA_HOME=/home/ubuntu/jdk1.8.0_151 export SPARK_WORKER_CORES=8 export SPARK_MASTER_HOST=ec2-54-208-204-190.compute-1.amazonaws.com

I have assigned Public DNS of master in SPARK_MASTER_HOST because assigning public IP of master was not working. It was showing me the following error:

MasterUI' could not bind on port 8080.

</li> </ol>

So, the above configuration worked for me and I can see slave01 successfully registered with master, and in Spark WebUI one worker was showing as intended. But when I tried to run SparkPi example, it could not add an executor. In logs from slave01 I have found the following:

`Caused by: java.io.IOException: Failed to connect to /172-31-23-69:48441`

172-31-23-69 is the private IP of the master. In my understanding, the slave01 wanted to connect to master by this private IP of master, but as they are not in the same vpc slave01 is failing to connect to master. I am not sure why slave01 will want to use private IP of master in the first place because I have given both Public DNS and IP of the master in spark-env.sh and hosts file. Also, how slave01 came to know the private IP of master is another interesting question.

I have tried to set SPARK_LOCAL_IP variable to public IP in both instances respectively, but that does not work either. So if anyone can show me any kind of direction here I will be very grateful. Thanks in advance.

Answer1:

When an EC2 instance has a public IPv4 address associated with it, you can't bind a socket to the public IP address, because of the way public IP addresses are handled in EC2.

The public IP is statically NAT-ed to the private IP by the Internet Gateway -- the instance itself is not aware of the public IP address.

(See the output from ifconfig -- the public IP is not there, and isn't supposed to be there -- only the private IP).

VPC peering allows you to interconnect the networks of multiple VPCs together, giving instances access to each other across account boundaries and even AWS region boundaries.

There may be an alternate solution specific to what you're doing, but keeping the traffic all within the bounds of private IP space seems like a good workaround and best practice.

Note that interconnected VPCs must have unique, non-overlapping CIDR blocks. Peering Isn't transitive, so peering VPC A to B and then peering VPC B to C does not allow VPCs A and C to communicate. Any two VPCs that have instances needing to communicate must be directly peered.

<a href="https://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html" rel="nofollow">https://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html</a>

Recommend

  • UnsatisfiedLinkError calling JNI generated by SWIG?
  • How do I find and replace a part of a string variable in Stata?
  • Nginx - how to redirect (301) www to non-www correctly for bot http /https?
  • Incompatible wildcard types that should be compatible [duplicate]
  • JVM minimum heap size recommendation reasons?
  • How to get rid of extra spaces in a textarea?
  • retrieve instagram images- get access denied message
  • How do I configure our MySQL ReplicationDriver for our JBoss 7 data source?
  • Can't use ignoreSSLIssues in HttpBuilder version 0.7.1
  • SQL Count. How can I count how many distinct values are in a table when an other two columns are mat
  • Does Julia have something equivalent to Ans (Matlab) or Last.value (R)
  • Resizing ToolStripButtons to fit complete BackGround image
  • Ember source code hosting URL for handlebars?
  • Delphi: Form becomes Frozen while assigning strings in thread
  • “undefined symbol: SQLAllocEnv” error in Java [duplicate]
  • cannot be assigned to — it is read only - C#
  • How to upload file on another domain?
  • Objective-C : getting error on console while trying to display app on ipad device?
  • parallelize process in missForest package
  • Weighted round robin dns between 2 Cloudfront distributions
  • Checking whether the server is on/off
  • DNS Lookup failed - Error with all browsers [closed]
  • javascript variables, What does var x = a = {} do?
  • Furthest-point Voronoi diagram in Java
  • Getting errors while using neuralnet function
  • how to solve invalid conversion specifier warning in iphone app
  • How to assign byte[] as a pointer in C#
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • htaccess add www if not subdomain, if subdomain remove www
  • JSON response opens as a file, but I can't access it with JavaScript
  • Accessing IRQ description array within a module and displaying action names
  • Date difference with leap year
  • Matrix multiplication with MKL
  • Hits per day in Google Big Query
  • JTable with a ScrollPane misbehaving
  • File not found error Google Drive API
  • unknown Exception android
  • failed to connect to specific WiFi in android programmatically
  • Converting MP3 duration time
  • How can I use threading to 'tick' a timer to be accessed by other threads?