Apache Spark what am I persisting here?


In this line, which RDD is being persisted? dropResultsN or dataSetN?

dropResultsN = dataSetN.map(s -> standin.call(s)).persist(StorageLevel.MEMORY_ONLY());

Question arises as a side issue from <a href="https://stackoverflow.com/questions/38296950/apache-spark-timing-foreach-operation-on-javardd" rel="nofollow">Apache Spark timing forEach operation on JavaRDD</a>, where I am still looking for a good answer to the core question of how best to time RDD creation.


dropResultsN is the persisted RDD (which is the RDD produced by mapping dataSetN onto the method standin.call()).


I found a good example of this in Learning Spark by O'Reilly:

It's example 3-40. persist() in Scala (assuming Java is the same)

import org.apache.spark.storage.StorageLevel val result = input.map( x => x*x ) result.persist(StorageLevel.[<your choice>][1]) <blockquote>

NOTE in Learning Spark: Notice that we called persist() on the RDD before the first action. The persist() call on its own doesn't force evaluation.


MY NOTE that in this example the persist is on the next line, I think this is much more clear than my code in my question.


  • C Pthreads - issues with thread-safe queue implementation
  • How to have a blendable project using MVVM-Light and WCF RIA Services
  • What is the preferred way to compose a set from multiple lists in Python
  • How do js animations work?
  • new spark.sql.shuffle.partitions value not used after checkpointing
  • detecting connection lost in spark streaming
  • What is the default HTTP verb in WebApi ? GET or POST?
  • File extension of zlib zipped html page?
  • iOS Cordova first plugin - plugin.xml to inject a feature
  • Extract All Possible Paths from Expression-Tree and evaluate them to hold TRUE
  • XSLT foreach repeating nodes to flat
  • TFS 2015 - Waiting for an agent to be requested
  • How to synchronize jQuery dialog box to act like alert() of Javascript
  • Ember.js model to be organised as a tree structure
  • Angular2 component view does not update on value change via method
  • List images(01.png) and descriptions(01.txt) from directory
  • Object and struct member access and address offset calculation
  • CakePHP ACL tutorial initDB function warnings
  • Parsing a CSV string while ignoring commas inside the individual columns
  • Jackson Parser: ignore deserializing for type mismatch
  • Problem deserializing objects from cache on MyBatis 3/Java
  • Content-Length header not returned from Pylons response
  • D3 nodes and links from JSON with nested arrays of children
  • Spark fat jar to run multiple versions on YARN
  • OpenGL ES texture problem, 4 duplicate columns and horizontal lines (Android)
  • Ajax Loaded meta Tags
  • Xamarin Forms - UWP Fonts
  • Cannot connect to cassandra from Spark
  • Display issues when we change from one jquery mobile page to another in firefox
  • Different response to non-authenticated users and AJAX calls
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Arrow is showed instead of the material design version hamburger icon. Why doesn't syncState in
  • How can I estimate amount of memory left with calling System.gc()?
  • Arrays break string types in Julia
  • Data Validation Drop Down Box Arrow Disappearing
  • SetUp method failed while running tests from teamcity
  • Understanding cpu registers
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]