I am trying to integrate Spark 2.1 job's metrics to Ganglia.
My spark-default.conf looks like
*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink *.sink.ganglia.name Name *.sink.ganglia.host $MASTERIP *.sink.ganglia.port $PORT *.sink.ganglia.mode unicast *.sink.ganglia.period 10 *.sink.ganglia.unit seconds
When i submit my job i can see the warn
Warning: Ignoring non-spark config property: *.sink.ganglia.host=host Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink Warning: Ignoring non-spark config property: *.sink.ganglia.period=10 Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649 Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds
My environment details are
Hadoop : Amazon 2.7.3 - emr-5.7.0 Spark : Spark 2.1.1, Ganglia: 3.7.2
If you have any inputs or any other alternative of Ganglia please reply.
according to the spark docs
The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties. A custom file location can be specified via the spark.metrics.conf configuration property.
so instead of having these confs in
spark-default.conf, move them to
For EMR specifically, you'll need to put these settings in
/etc/spark/conf/metrics.properties on the master node.
Spark on EMR does include the Ganglia library:
$ ls -l /usr/lib/spark/external/lib/spark-ganglia-lgpl_* -rw-r--r-- 1 root root 28376 Mar 22 00:43 /usr/lib/spark/external/lib/spark-ganglia-lgpl_2.11-2.3.0.jar
In addition, your example is missing the equals sign (
=) between the config names and values - unsure if that's an issue. Below is an example config that worked successfully for me.
*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink *.sink.ganglia.name=AMZN-EMR *.sink.ganglia.host=$MASTERIP *.sink.ganglia.port=8649 *.sink.ganglia.mode=unicast *.sink.ganglia.period=10 *.sink.ganglia.unit=seconds
From this page: https://spark.apache.org/docs/latest/monitoring.html
Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions: GangliaSink: Sends metrics to a Ganglia node or multicast group. **To install the GangliaSink you’ll need to perform a custom build of Spark**. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user