85008

How to define the concept of capacity in ArrayLists?

Question:

I understand that capacity is the number of elements or available spaces in an ArrayList that may or may not hold a value referencing an object. I am trying to understand more about the concept of capacity.

So I have three questions:

1) What are some good ways to define what capacity represents from a memory standpoint?

...the (contiguous?) memory allocated to the ArrayList?

...the ArrayLists’s memory footprint on the (heap?)?

2) Then if the above is true, changing capacity requires some manner of memory management overhead?

3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?

Answer1:

<ol><li>The class is called ArrayList because it's based on an array. The capacity is the size of the array, which requires a block of contiguous heap memory. However, note that the array itself contains only <em>references</em> to the elements, which are separate objects on the heap.</li> <li>Increasing the capacity requires allocating a new, larger array and copying all the references from the old array to the new one, after which the old one becomes eligible for garbage collection.</li> <li>You've cited the main case where performance could be a concern. In practice, I've never seen it actually become a problem, since the element objects usually take up much more memory (and possibly CPU time) than the list.</li> </ol>

Answer2:

ArrayList is implemented like this:

class ArrayList { private Object[] elements; }

the capacity is the size of that array.

Now, if your capacity is 10, and you're adding 11-th element, ArrayList will do this:

Object[] newElements = new Object[capacity * 1.5]; System.arraycopy(this.elements, newElements); this.elements = newElements;

So if you start off with a small capacity, ArrayList will end up creating a bunch of arrays and copying stuff around for you as you keep adding elements, which isn't good.

On the other hand, if you specify a capacity of 1,000,000 and add only 3 elements to ArrayList, that also is kinda bad.

Rule of thumb: if you know the capacity, specify it. If you aren't sure but know the upper bound, specify that. If you just aren't sure, use the defaults.

Answer3:

Capacity is as you described it -- the contiguous memory allocated to an ArrayList for storage of values. ArrayList stores all values in an array, and automatically resizes the array for you. This incurs memory management overhead when resizing.

If I remember correctly, Java increases the size of an ArrayList's backing array from size N to size 2N + 2 when you try to add one more element than the capacity can take. I do not know what size it increases to when you use the insert method (or similar) to insert at a specific position beyond the end of the capacity, or even whether it allows this.

Here is an example to help you think about how it works. Picture each space between the |s as a cell in the backing array:

| | |

size = 0 (contains no elements), capacity = 2 (can contain 2 elements).

|1| |

size = 1 (contains 1 element), capacity = 2 (can contain 2 elements).

|1|2|

size = 2, capacity = 2. Adding another element:

|1|2|3| | | |

size increased by 1, capacity increased to 6 (2 * 2 + 2). This can be expensive with large arrays, as allocating a large contiguous memory region can require a bit of work (as opposed to a LinkedList, which allocates many small pieces of memory) because the JVM needs to search for an appropriate location, and may need to ask the OS for more memory. It is also expensive to copy a large number of values from one place to another, which would be done once such a region was found.

My rule of thumb is this: If you know the capacity you will require, use an ArrayList because there will only be one allocation and access is very fast. If you do not know your required capacity, use a LinkedList because adding a new value always takes the same amount of work, and there is no copying involved.

Answer4:

<blockquote>

<em>1) What are some good ways to define what capacity represents from a memory standpoint?</em>

<em>...the (contiguous?) memory allocated to the ArrayList?</em>

</blockquote>

Yes, an ArrayList is backed up by an array, to that represents the internal array size.

<blockquote>

<em>...the ArrayLists’s memory footprint on the (heap?)?</em>

</blockquote>

Yes, the larget the array capacity, the more footprint used by the arraylist.

<blockquote>

<em>2) Then if the above is true, changing capacity requires some manner of memory management overhead?</em>

</blockquote>

It is. When the list grows large enough, a larger array is allocated and the contents copied. The previous array maybe discarded and marked for garbage collection.

<blockquote>

<em>3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?</em>

</blockquote>

Yes, if you create the ArrayList with initial capacity of 1 ( for instance ) and your list grows way beyond that. If you know upfront the number of elements to store, you better request an initial capacity of that size.

<strong>However</strong> I think this should be low in your list of priorities, while array copy may happen very often, it is optimized since the early stages of Java, and should not be a concern. Better would be to choose a right algorithm, I think. Remember: <a href="http://c2.com/cgi/wiki?PrematureOptimization" rel="nofollow">Premature optimization is the root of all evil</a>

See also: <a href="https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist/322742#322742" rel="nofollow">When to use LinkedList over ArrayList</a>

Recommend

  • PHP functions that has direct output to browser
  • Map Job Performance on cluster
  • How to generate multi workers and bind them to non-web Java applications on Heroku
  • Numerical Triple Integration in R
  • What is wrong in my MVC implementation?
  • TensorFlow C++, runtime issue
  • Allocating a 2D contiguous array within a function
  • Deployments not visible in Kubernetes Dashboard
  • android-support-v7-appcompat has same attrs as actionbarsherlock library
  • Refactoring advice: maps to POJOs
  • Which browser have this strange user agent? (IOS device)
  • Unique Permutations - with exceptions
  • Where these are stored?
  • abstracting over a collection
  • How can I tell a form not to dispose a particular control when it closes?
  • C: Incompatible pointer type initializing
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • Assign variable to the value in HTML
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • output of program is not same as passed argument
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Join two tables and save into third-sql
  • How to model a transition system with SPIN
  • How to make Safari send if-modified-since header?
  • Weird JavaScript statement, what does it mean?
  • ORA-29908: missing primary invocation for ancillary operator
  • Akka Routing: Reply's send to router ends up as dead letters
  • Calling of Constructors in a Java
  • How to pass list parameters for each object using Spring MVC?
  • retrieve vertices with no linked edge in arangodb
  • SQL merge duplicate rows and join values that are different
  • Setting background image for body element in xhtml (for different monitors and resolutions)
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • unknown Exception android
  • JaxB to read class hierarchy
  • Checking variable from a different class in C#
  • Django query for large number of relationships
  • Converting MP3 duration time