53834

Multiprocessing - using the Managers Namespace to save memory

Question:

I have several processes each completing tasks which require a single large numpy array, this is only being read (the threads are searching it for appropriate values).

If each process loads the data I receive a memory error.

I am therefore trying to minimise the memory usage by using a Manager to share the same array between the processes.

However I still receive a memory error. I <strong>can load the array</strong> once in the main process however the moment I try to make it an <strong>attribute</strong> of the manager namespace I receive a <strong>memory error</strong>. I assumed the Managers acted like pointers and allowed seperate processes (which normally only have access to their own memory) to have access to this shared memory as well. However the error mentions pickling:

<pre class="lang-none prettyprint-override">Traceback (most recent call last): File <PATH>, line 63, in <module> ns.pp = something File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\managers.py", line 1021, in __setattr__ return callmethod('__setattr__', (key, value)) File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\managers.py", line 716, in _callmethod conn.send((self._id, methodname, args, kwds)) File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "C:\Program Files (x86)\Python35-32\lib\multiprocessing\reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) MemoryError

I assume the numpy array is actually being copied when assigned to the manager, but I may be wrong.

To make matters a little more irritating I am on a machine with 32GB of memory and watching the memory usage it only increases a little berfore crashing, maybe by 5%-10% at most.

Could someone explain <strong>why making the array an attribute of the namespace takes up even more memory?</strong> and <strong>why my program won't use some of the spare memory avaliable?</strong> (I have already read the <a href="https://docs.python.org/3.6/tutorial/classes.html#python-scopes-and-namespaces" rel="nofollow">namespace</a> and <a href="https://docs.python.org/3.6/library/multiprocessing.html#multiprocessing.managers" rel="nofollow">manager</a> docs as well as these <a href="https://stackoverflow.com/questions/22487296/multiprocessing-in-python-sharing-large-object-e-g-pandas-dataframe-between" rel="nofollow">managers</a> and <a href="https://stackoverflow.com/questions/3913217/what-are-python-namespaces-all-about" rel="nofollow">namespace</a> threads on SO.

I am running Windows Server 2012 R2 and Python 3.5.2 32bit.

Here is some code demonstrating my problem (you will need to use an alternative file to large.txt, this file is ~75MB of tab delimited strings):

import multiprocessing import numpy as np if __name__ == '__main__': # load Price Paid Data and assign to manager mgr = multiprocessing.Manager() ns = mgr.Namespace() ns.data = np.genfromtxt('large.txt') # Alternative proving this work for smaller objects # ns.data = 'Test PP data'

Answer1:

Manager types are built for flexibility not efficiency. They create a server process that holds the values, and can return proxy objects to each process they are needed in. The server and proxy communicate over tls to allow the server and proxy to be on different machines, but this necessarily means copying whatever object is in question. I haven't traced the source all the way, so it's possible the extra copy may be garbage collected after use, but at least initially there has to be a copy.

If you want shared physical memory, I suggest using <a href="https://docs.python.org/3.6/library/multiprocessing.html#shared-ctypes-objects" rel="nofollow">Shared ctypes Objects</a>. These actually do point to a common location in memory, and therefore are much faster, and resource-light. They do not support all the same things full fat python objects do, but they can be extended by creating <a href="https://docs.python.org/3.6/library/ctypes.html#ctypes.Structure" rel="nofollow">structs</a> to organize your data.

Recommend

  • how to update a form from a thread
  • How to get an object of java class from JSP
  • Correct & simplest way of calling one js file inside another js
  • Proxy pattern in Python
  • How to access shadow dow with jquery in a polymer.dart component
  • How to mark a global as deprecated in Python?
  • Dictionary-like object in Python that allows setting arbitrary attributes
  • How to create c# console application to cosume the .net webservice [closed]
  • How to make class immutable in python? [duplicate]
  • Should two modules use the same redis connection? (I'm working with Flask)
  • Avoiding duplicated data in PostgreSQL database in Python
  • How to set THTTPRio.Converter.Options to soLiteralParams in OnBeforeExecuteEvent
  • PreparedStatement - how specify to use default value of column
  • Inserting a duplicate record using Npgsql
  • literal does not match format string error on updating sql table
  • How to call mysqli_stmt with call_user_func_array?
  • In Jaspersoft Studio, Sql server procedure returning fewer fields than expected
  • Error: String or binary data would be truncated. The statement has been terminated
  • mongodb num_rows equivalent php
  • unable to obtain stable firefox connection in 60 seconds
  • Execute Success but num_rows return 0 [PHP-MySQL]
  • How to export MS Access table into a csv file in Python using e.g. pypyodbc
  • Mongodb exception, “ MongoCursorException' with message '$ operator made object too large”
  • Displaying Data From Multiple MySQL Tables
  • Converting query results into DataFrame in python
  • Python 3.2.2, error(scripts to exe)
  • pillow imaging ImportError
  • Django simple Captcha “No module named fields” error
  • How to know when stdin is empty if it contains EOF?
  • MongoError: Incorrect arguments
  • Why value captured by reference in lambda is broken? [duplicate]
  • Possible to stop flickering java tooltip in heavyweight mode?
  • Why joiner is not used after Sequence generator or Update statergy
  • Checking variable from a different class in C#
  • Django query for large number of relationships
  • Recursive/Hierarchical Query Using Postgres
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • How can I use `wmic` in a Windows PE script?
  • UserPrincipal.Current returns apppool on IIS
  • How to push additional view controllers onto NavigationController but keep the TabBar?