How to integrate Ganglia for Spark 2.1 Job metrics, Spark ignoring Ganglia metrics

I am trying to integrate Spark 2.1 job's metrics to Ganglia.

My spark-default.conf looks like

*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink *.sink.ganglia.name Name *.sink.ganglia.host $MASTERIP *.sink.ganglia.port $PORT *.sink.ganglia.mode unicast *.sink.ganglia.period 10 *.sink.ganglia.unit seconds

When i submit my job i can see the warn

Warning: Ignoring non-spark config property: *.sink.ganglia.host=host Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink Warning: Ignoring non-spark config property: *.sink.ganglia.period=10 Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649 Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds

My environment details are

Hadoop : Amazon 2.7.3 - emr-5.7.0 Spark : Spark 2.1.1, Ganglia: 3.7.2

If you have any inputs or any other alternative of Ganglia please reply.

Answer1:

according to the spark docs

The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties. A custom file location can be specified via the spark.metrics.conf configuration property.

so instead of having these confs in spark-default.conf, move them to $SPARK_HOME/conf/metrics.properties

Answer2:

For EMR specifically, you'll need to put these settings in /etc/spark/conf/metrics.properties on the master node.

Spark on EMR does include the Ganglia library:

$ ls -l /usr/lib/spark/external/lib/spark-ganglia-lgpl_* -rw-r--r-- 1 root root 28376 Mar 22 00:43 /usr/lib/spark/external/lib/spark-ganglia-lgpl_2.11-2.3.0.jar

In addition, your example is missing the equals sign (=) between the config names and values - unsure if that's an issue. Below is an example config that worked successfully for me.

*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink *.sink.ganglia.name=AMZN-EMR *.sink.ganglia.host=$MASTERIP *.sink.ganglia.port=8649 *.sink.ganglia.mode=unicast *.sink.ganglia.period=10 *.sink.ganglia.unit=seconds

Answer3:

From this page: https://spark.apache.org/docs/latest/monitoring.html

Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions: GangliaSink: Sends metrics to a Ganglia node or multicast group. **To install the GangliaSink you’ll need to perform a custom build of Spark**. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:How to integrate Ganglia for Spark 2.1 Job metrics, Spark ignoring Ganglia metrics