How to apply different functions to a groupby object?


I have a dataframe like this:

import pandas as pd df = pd.DataFrame({'id': [1, 2, 1, 1, 2, 1, 2, 2], 'min_max': ['max_val', 'max_val', 'min_val', 'min_val', 'max_val', 'max_val', 'min_val', 'min_val'], 'value': [1, 20, 20, 10, 12, 3, -10, -5 ]}) id min_max value 0 1 max_val 1 1 2 max_val 20 2 1 min_val 20 3 1 min_val 10 4 2 max_val 12 5 1 max_val 3 6 2 min_val -10 7 2 min_val -5

Each id has several maximal and minimal values associated with it. My <strong>desired output</strong> looks like this:

max min id 1 3 10 2 20 -10

It contains the maximal max_val and the minimal min_val for each id.

Currently I implement that as follows:

gdf = df.groupby(by=['id', 'min_max'])['value'] max_max = gdf.max().loc[:, 'max_val'] min_min = gdf.min().loc[:, 'min_val'] final_df = pd.concat([max_max, min_min], axis=1) final_df.columns = ['max', 'min']

What I don't like is that I have to call .max() and .min() on the grouped dataframe gdf, separately where I throw away 50% of the information (since I am not interested in the maximal min_val and the minimal min_val).

Is there a way to do this in a more straightforward manner by e.g. passing the function that should be applied to a group directly to the groupby call?



is not sufficient as there can be the case that a group has a min_val that is higher than all max_val for that group or a max_val that is lower than all min_val. Thus, one also has to group based on the column min_max.

Result for

df.groupby('id')['value'].agg(['max','min']) max min id 1 20 1 2 20 -10

Result for the code from above:

max min id 1 3 10 2 20 -10


Here's a slightly tongue-in-cheek solution:

>>> df.groupby(['id', 'min_max'])['value'].apply(lambda g: getattr(g, g.name[1][:3])()).unstack() min_max max_val min_val id 1 3 10 2 20 -10

This applies a function that grabs the name of the real function to apply from the group key.

Obviously this wouldn't work so simply if there weren't such a simple relationship between the string "max_val" and the function name "max". It could be generalized by having a dict mapping column values to functions to apply, something like this:

func_map = {'min_val': min, 'max_val': max} df.groupby(['id', 'min_max'])['value'].apply(lambda g: func_map[g.name[1]](g)).unstack()

Note that this is slightly less efficient than the version above, since it calls the plain Python max/min rather than the optimized pandas versions. But if you want a more generalizable solution, that's what you have to do, because there aren't optimized pandas versions of everything. (This is also more or less why there's no built-in way to do this: for most data, you can't assume a priori that your values can be mapped to meaningful functions, so it doesn't make sense to try to determine the function to apply based on the values themselves.)


One option is to do the customized aggregation with groupby.apply, since it doesn't fit with built in aggregation scenario well:

(df.groupby('id') .apply(lambda g: pd.Series({'max': g.value[g.min_max == "max_val"].max(), 'min': g.value[g.min_max == "min_val"].min()}))) # max min #id # 1 3 10 # 2 20 -10


Solution with <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html" rel="nofollow">pivot_table</a>:

df1 = df.pivot_table(index='id', columns='min_max', values='value', aggfunc=[np.min,np.max]) df1 = df1.loc[:, [('amin','min_val'), ('amax','max_val')]] df1.columns = df1.columns.droplevel(1) print (df1) amin amax id 1 10 3 2 -10 20


  • Displaying iOS iAds only to supported countries
  • Newtonsoft inline formatting for subelement while serializing
  • Running java programs in one runtime instance
  • How to provide hyperlink in email pointing to a specific method inside gwt app (but not main page)
  • Why is RAM in powers of 2?
  • Visual basic auto imports namespaces
  • Is it possible to disable jQuery's mobile responsive design?
  • Process.PrivateMemorySize64 returning committed memory instead of private
  • How to model a mixture of finite components from different parametric families with JAGS?
  • WPF Listbox commands
  • Avoid Inheriting Super Class Tests in ScalaTest
  • JavaScriptCore External Arrays
  • CSS how to fix an element to scroll horizontally with the page but not vertically?
  • How do I check if System::Collections:ArrayList is empty / nullptr / null?
  • Creating a C++ function that calls other Lua function
  • Ionic storage “get” returns null only on the second call within method
  • multidatatrigger with multibinding in ControlTemplate.Triggers
  • Can my PDF ping my server when it is opened?
  • How to retrieve information from antrun back to maven?
  • How to use function wrapper in mustache.php?
  • Cloud Code function running twice
  • WPF version of .ScaleControl?
  • Primefaces :radioButton inside a ui:repeat
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Breaking out column by groups in Pandas
  • Unable to get column index with table.getColumn method using custom table Model
  • Thread safety of a fluent like class using clone() and non final fields
  • Get history of file changes from TFS to implement custom “blame”-behaviour of exceptions
  • Converting a WriteableBitmap image ToArray in UWP
  • How would I use PHP exceptions to define a redirect?
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Statically linking a C++ library to a C# process using CLI or any other way
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Web-crawler for facebook in python
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?