Pandas with rpy2 and multiprocessing


I'm trying to speedup a process using Pandas and R.

Suppose that I have the following dataframe:

import pandas as pd from random import randint df = pd.DataFrame({'mpg': [randint(1, 9) for x in xrange(10)], 'wt': [randint(1, 9)*10 for x in xrange(10)], 'cyl': [randint(1, 9)*100 for x in xrange(10)]}) df mpg wt cyl 0 3 40 100 1 6 30 200 2 7 70 800 3 3 50 200 4 7 50 400 5 4 10 400 6 3 70 500 7 8 30 200 8 3 40 800 9 6 60 200

then, I use rpy2 to model some data:

import rpy2.robjects.packages as rpackages import rpy2.robjects as robjects from rpy2.robjects import pandas2ri pandas2ri.activate() base = rpackages.importr('base') stats = rpackages.importr('stats') formula = 'mpg ~ wt + cyl' fit_full = stats.lm(formula, data=df)

after this I make some predictions:

rfits = stats.predict(fit_full, newdata=df)

This code runs without problems for a small dataframe, but actually I have a big dataframe with millions of lines and I'm trying to speedup the prediction part using other rpy2 models, but unfortunately this takes a long time to process.

I've tried to use for the first time the multiprocessing library for this task without success:

import multiprocessing as mp pool = mp.Pool(processes=4) rfits = pool.map(predict(fit_full, newdata=df))

but probably I'm doing something wrong since I can't see any speed improvement.

I think the main problem here, is because I'm trying to apply the pool.map to rpy2 function and not a Python predefined function. Probably there is some workaround solution for this without using the multiprocessing library, but I can't see any.

Any help would be greatly appreciated. Thanks in advance.


Have you tried using StatsModels?


<strong><a href="http://statsmodels.sourceforge.net/devel/example_formulas.html" rel="nofollow">Fitting models using R-style formulas</a></strong> Since version 0.5.0, statsmodels allows users to fit statistical models using R-style formulas. Internally, statsmodels uses the patsy package to convert formulas and data to the matrices that are used in model fitting. The formula framework is quite powerful; this tutorial only scratches the surface. A full description of the formula language can be found in the patsy docs

</blockquote> import statsmodels.formula.api as smf formula = 'mpg ~ wt + cyl' model = smf.ols(formula=formula, data=df) params = model.fit().params >>> params params Intercept 5.752803 wt 0.037770 cyl -0.004112 >>> model.predict(params, exog=df) array([ 1725.83759267, 2876.50148582, 575.25352613, 1150.6605447 , 1150.51281171, 3451.54178359, 575.53800931, 575.4146529 , 2876.58372342, 5177.46831077])


  • Linking text views to scroll together
  • React: Re-Rendering on Setting State - Hooks vs. this.setState
  • Is there a way to use previous answers in inquirer when presenting a prompt?
  • Getting the base url of my server with JAX-RS
  • get loudness level from raw data recieved from microphone in DirectShow
  • jQuery validate plugin : adding a custom validator to accept letters only?
  • Yii2 login give access to backend when user login is from frontend
  • How to stop Makefile from expanding my shell output?
  • Check/Uncheck - ifChecked not working
  • PHP - How to access and retrieve important data from a pop3 email account?
  • Multiple hostnames and multiple privileges?
  • Configure log4j for maxsize and rotation
  • how to display same image multiple times using same image in javascript
  • SPOJ: GENERAL (Time limit exceeded)
  • In Moment.js, how do you get the date of the next occurrence of a specific month (ex: 'next Jan
  • Wrap array elements in divs based on same value
  • Can I switch the 'connected' user within an sql script that is sourced by mysql?
  • remove date from DateTimePicker for Compact Framework
  • Python tk scrollbar becomes inactive once text is outside the screen
  • Expression.Call GroupBy then Select and Count()?
  • MFMailComposer send email without presenting view
  • How to put an object in the air?
  • Is possible having two COM STA instances of the same component?
  • How does the dispatcher work when mixing sync/async with serial/concurrent queue?
  • Bison does not appear to recognize C string literals appropriately
  • Facebook friend list in Facebook Android SDK 3.14
  • WiX ManagedBootstrapper SetDownloadSource confusion
  • Adding native code to an existing Worklight hybrid app
  • Building JavaFX 2.0 App on Mac, deploying on Windows
  • how to run a different select statement based on condition in Hive SQL
  • How to restrict use of third party camera app from your app
  • Terminal run dalvikvm with am.jar
  • How to handle div that is created dynamically in a table
  • Make checkout phone field optional for specific countries in WooCommerce
  • Android Library Projects on Windows and Mac
  • ReferenceError: TextEncoder is not defined