88500

How to apply different functions to a groupby object?

Question:

I have a dataframe like this:

import pandas as pd df = pd.DataFrame({'id': [1, 2, 1, 1, 2, 1, 2, 2], 'min_max': ['max_val', 'max_val', 'min_val', 'min_val', 'max_val', 'max_val', 'min_val', 'min_val'], 'value': [1, 20, 20, 10, 12, 3, -10, -5 ]}) id min_max value 0 1 max_val 1 1 2 max_val 20 2 1 min_val 20 3 1 min_val 10 4 2 max_val 12 5 1 max_val 3 6 2 min_val -10 7 2 min_val -5

Each id has several maximal and minimal values associated with it. My <strong>desired output</strong> looks like this:

max min id 1 3 10 2 20 -10

It contains the maximal max_val and the minimal min_val for each id.

Currently I implement that as follows:

gdf = df.groupby(by=['id', 'min_max'])['value'] max_max = gdf.max().loc[:, 'max_val'] min_min = gdf.min().loc[:, 'min_val'] final_df = pd.concat([max_max, min_min], axis=1) final_df.columns = ['max', 'min']

What I don't like is that I have to call .max() and .min() on the grouped dataframe gdf, separately where I throw away 50% of the information (since I am not interested in the maximal min_val and the minimal min_val).

Is there a way to do this in a more straightforward manner by e.g. passing the function that should be applied to a group directly to the groupby call?

EDIT:

df.groupby('id')['value'].agg(['max','min'])

is not sufficient as there can be the case that a group has a min_val that is higher than all max_val for that group or a max_val that is lower than all min_val. Thus, one also has to group based on the column min_max.

Result for

df.groupby('id')['value'].agg(['max','min']) max min id 1 20 1 2 20 -10

Result for the code from above:

max min id 1 3 10 2 20 -10

Answer1:

Here's a slightly tongue-in-cheek solution:

>>> df.groupby(['id', 'min_max'])['value'].apply(lambda g: getattr(g, g.name[1][:3])()).unstack() min_max max_val min_val id 1 3 10 2 20 -10

This applies a function that grabs the name of the real function to apply from the group key.

Obviously this wouldn't work so simply if there weren't such a simple relationship between the string "max_val" and the function name "max". It could be generalized by having a dict mapping column values to functions to apply, something like this:

func_map = {'min_val': min, 'max_val': max} df.groupby(['id', 'min_max'])['value'].apply(lambda g: func_map[g.name[1]](g)).unstack()

Note that this is slightly less efficient than the version above, since it calls the plain Python max/min rather than the optimized pandas versions. But if you want a more generalizable solution, that's what you have to do, because there aren't optimized pandas versions of everything. (This is also more or less why there's no built-in way to do this: for most data, you can't assume a priori that your values can be mapped to meaningful functions, so it doesn't make sense to try to determine the function to apply based on the values themselves.)

Answer2:

One option is to do the customized aggregation with groupby.apply, since it doesn't fit with built in aggregation scenario well:

(df.groupby('id') .apply(lambda g: pd.Series({'max': g.value[g.min_max == "max_val"].max(), 'min': g.value[g.min_max == "min_val"].min()}))) # max min #id # 1 3 10 # 2 20 -10

Answer3:

Solution with <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html" rel="nofollow">pivot_table</a>:

df1 = df.pivot_table(index='id', columns='min_max', values='value', aggfunc=[np.min,np.max]) df1 = df1.loc[:, [('amin','min_val'), ('amax','max_val')]] df1.columns = df1.columns.droplevel(1) print (df1) amin amax id 1 10 3 2 -10 20

Recommend

  • Displaying iOS iAds only to supported countries
  • Newtonsoft inline formatting for subelement while serializing
  • Running java programs in one runtime instance
  • How to provide hyperlink in email pointing to a specific method inside gwt app (but not main page)
  • Why is RAM in powers of 2?
  • Visual basic auto imports namespaces
  • Is it possible to disable jQuery's mobile responsive design?
  • Process.PrivateMemorySize64 returning committed memory instead of private
  • How to model a mixture of finite components from different parametric families with JAGS?
  • WPF Listbox commands
  • Avoid Inheriting Super Class Tests in ScalaTest
  • JavaScriptCore External Arrays
  • CSS how to fix an element to scroll horizontally with the page but not vertically?
  • How do I check if System::Collections:ArrayList is empty / nullptr / null?
  • Creating a C++ function that calls other Lua function
  • Ionic storage “get” returns null only on the second call within method
  • multidatatrigger with multibinding in ControlTemplate.Triggers
  • Can my PDF ping my server when it is opened?
  • How to retrieve information from antrun back to maven?
  • How to use function wrapper in mustache.php?
  • Cloud Code function running twice
  • WPF version of .ScaleControl?
  • Primefaces :radioButton inside a ui:repeat
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Breaking out column by groups in Pandas
  • Unable to get column index with table.getColumn method using custom table Model
  • Thread safety of a fluent like class using clone() and non final fields
  • Get history of file changes from TFS to implement custom “blame”-behaviour of exceptions
  • Converting a WriteableBitmap image ToArray in UWP
  • How would I use PHP exceptions to define a redirect?
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Statically linking a C++ library to a C# process using CLI or any other way
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Web-crawler for facebook in python
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?