15890

Cannot connect to cassandra from Spark

I have some test data in my cassandra. I am trying to fetch this data from spark but I get an error like :

py4j.protocol.Py4JJavaError: An error occurred while calling o25.load. java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042

This is what I've done till now:

<ol> <li>started ./bin/cassandra</li> <li>created test data using cql with keyspace ="testkeyspace2" and table="emp" and some keys and corresponding values.</li> <li>Wrote <strong>standalone.py</strong></li> <li>

Ran the following pyspark shell command.

sudo ./bin/spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar \ --packages TargetHolding:pyspark-cassandra:0.2.4 \ examples/src/main/python/standalone.py </li> <li>

Got the mentioned error.

</li> </ol> <hr>

<strong>standalone.py:</strong>

from pyspark import SparkContext, SparkConf from pyspark.sql import SQLContext conf = SparkConf().setAppName("Stand Alone Python Script") sc = SparkContext(conf=conf) sqlContext = SQLContext(sc) loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\ .options(table="emp", keyspace = "testkeyspace2")\ .load()\ .show()

I also tried with --packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11 but I'm getting the same error.

<hr>

<strong>Debug:</strong>

I checked

netstat -tulpn | grep -i listen | grep <cassandra_pid>

and saw that it is listening on port 9042.

<hr>

<strong>Full log trace:</strong>

Traceback (most recent call last): File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module> .options(table="emp", keyspace = "testkeyspace2")\ File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.load. : java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042 at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150) at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109) at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176) at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203) at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect)) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307) at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339) at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157) ... 22 more

Am I doing something wrong?

I'm really new to all this so I could use some advice. Thanks!

Answer1:

Based on our conversations in the question comments, the issue is that 'localhost' was used for rpc_address in your cassandra.yaml file. Cassandra used the OS to resolve 'localhost' to 127.0.0.1 and listened on that interface explicitly.

To fix this you either need to update rpc_address to 127.0.1.1 in cassandra.yaml and restart cassandra or update your SparkConf to reference 127.0.0.1, i.e.:

conf = SparkConf().setAppName("Stand Alone Python Script")
                  .set("spark.cassandra.connection.host", "127.0.0.1")


Although one thing that seems odd to me is that spark.cassandra.connection.host also defaults to 'localhost', so it is weird to me that the spark cassandra connector resolved 'localhost' as '127.0.1.1' yet cassandra resolved it as '127.0.0.1'.

Answer2:

I checked my linux hosts file in /etc/hosts and the the content was like

127.0.0.1 localhost 127.0.1.1 <my hostname>

I changed it to:

127.0.0.1 localhost 127.0.0.1 <my hostname>

and it worked fine.

As you can see in your own log file line number 58 it mentions Your hostname, ganguly resolves to a loopback address: 127.0.1.1; using 192.168.1.32 instead (on interface wlan0) which I guess this apply o your case as well.

Answer3:

Add this next to your --packages dependency, it worked for me perfectly fine. --conf spark.cassandra.connection.host="127.0.0.1"

Recommend

  • aws bad request(400) spark
  • Runtime services no longer get injected into DNX console app (RC1)
  • Pyspark connection to Postgres database in ipython notebook
  • Paging with Spring Data Graph/Neo4j
  • What is dotnet5.4 and net451 in class library?
  • How to subset SparkR data frame
  • Apply a logic for a particular column in dataframe in spark
  • Issue in dynamic table creation
  • Write dataframe to csv with datatype map in Spark
  • Running in “deadlock” while doing streaming aggregations from Kafka
  • Manipulating a dataframe within a Spark UDF
  • can I have 2 different datasources in groovy with different privileges
  • CLR Stored Procedure with C# throwing errors
  • OWIN Authentication with Google
  • SparkR and Packages
  • c# linear regression given 2 sets of data
  • What's an elegant way of accessing parent controller's member from child controller?
  • RabbitMQ java client stops consuming messages
  • Loading fixtures in sails tests
  • Glassfish - java.lang.NoClassDefFoundError
  • File extension of zlib zipped html page?
  • iOS Cordova first plugin - plugin.xml to inject a feature
  • Best way to dynamically load an xml configuration file into a Flex4/Flash movie at runtime?
  • Problem deserializing objects from cache on MyBatis 3/Java
  • JQuery Internet Explorer and ajaxstop
  • JSON response opens as a file, but I can't access it with JavaScript
  • Read a local file using javascript
  • ImageMagick, replace semi-transparent white with opaque white
  • How to delay loading a property with linq to sql external mapping?
  • Cannot connect to cassandra from Spark
  • Cross-Platform Protobuf Serialization
  • Font Awesome Showing Box instead of Icons
  • Alternatives to the OPTIONAL fallback SPARQL pattern?
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • SQL merge duplicate rows and join values that are different
  • LevelDB C iterator
  • Is it possible to post an object from jquery to bottle.py?
  • Can't mass-assign protected attributes when import data from csv file
  • Python/Django TangoWithDjango Models and Databases