84256

Apache Spark what am I persisting here?

Question:

In this line, which RDD is being persisted? dropResultsN or dataSetN?

dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());

Question arises as a side issue from <a href="https://stackoverflow.com/questions/38296950/apache-spark-timing-foreach-operation-on-javardd" rel="nofollow">Apache Spark timing forEach operation on JavaRDD</a>, where I am still looking for a good answer to the core question of how best to time RDD creation.

Answer1:

dropResultsN is the persisted RDD (which is the RDD produced by mapping dataSetN onto the method standin.call()).

Answer2:

I found a good example of this in Learning Spark by O'Reilly:

It's example 3-40. persist() in Scala (assuming Java is the same)

import org.apache.spark.storage.StorageLevel val result = input.map( x => x*x ) result.persist(StorageLevel.[<your choice>][1]) <blockquote>

NOTE in Learning Spark: Notice that we called persist() on the RDD before the first action. The persist() call on its own doesn't force evaluation.

</blockquote>

MY NOTE that in this example the persist is on the next line, I think this is much more clear than my code in my question.

Recommend

  • C Pthreads - issues with thread-safe queue implementation
  • How to have a blendable project using MVVM-Light and WCF RIA Services
  • What is the preferred way to compose a set from multiple lists in Python
  • How do js animations work?
  • new spark.sql.shuffle.partitions value not used after checkpointing
  • detecting connection lost in spark streaming
  • What is the default HTTP verb in WebApi ? GET or POST?
  • File extension of zlib zipped html page?
  • iOS Cordova first plugin - plugin.xml to inject a feature
  • Extract All Possible Paths from Expression-Tree and evaluate them to hold TRUE
  • XSLT foreach repeating nodes to flat
  • TFS 2015 - Waiting for an agent to be requested
  • How to synchronize jQuery dialog box to act like alert() of Javascript
  • Ember.js model to be organised as a tree structure
  • Angular2 component view does not update on value change via method
  • List images(01.png) and descriptions(01.txt) from directory
  • Object and struct member access and address offset calculation
  • CakePHP ACL tutorial initDB function warnings
  • Parsing a CSV string while ignoring commas inside the individual columns
  • Jackson Parser: ignore deserializing for type mismatch
  • Problem deserializing objects from cache on MyBatis 3/Java
  • Content-Length header not returned from Pylons response
  • D3 nodes and links from JSON with nested arrays of children
  • Spark fat jar to run multiple versions on YARN
  • OpenGL ES texture problem, 4 duplicate columns and horizontal lines (Android)
  • Ajax Loaded meta Tags
  • Xamarin Forms - UWP Fonts
  • Cannot connect to cassandra from Spark
  • Display issues when we change from one jquery mobile page to another in firefox
  • Different response to non-authenticated users and AJAX calls
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Arrow is showed instead of the material design version hamburger icon. Why doesn't syncState in
  • How can I estimate amount of memory left with calling System.gc()?
  • Arrays break string types in Julia
  • Data Validation Drop Down Box Arrow Disappearing
  • SetUp method failed while running tests from teamcity
  • Understanding cpu registers
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]