60161

Using pandas .agg to do value_counts() twice

<h3>Question</h3>

I am trying to do a groupby on a dataframe where I apply value_counts(normalize=True) and value_counts(normalize=False) on it at the same time using .agg.

However, I cannot find a way to do this without it throwing an error. I have tried multiple methods here: Multiple aggregations of the same column using pandas GroupBy.agg() but none seem to work for me. A part of the issue for me is having to pass normalize to value_counts.

I have created a test example like using this:

example = pd.DataFrame({'A': ['a','a','a','b','b','c'], 'B':[1,1,2,3,3,4]})

which gives me:

+---+---+---+ | | A | B | +---+---+---+ | 0 | a | 1 | | 1 | a | 1 | | 2 | a | 2 | | 3 | b | 3 | | 4 | b | 3 | | 5 | c | 4 | +---+---+---+

and I want to return:

A B False True a 1 2 0.666 2 1 0.333 b 3 2 1.000 c 3 1 1.000

Doing something like:

example.groupby('A')['B'].value_counts(normalize=True)

gives me half of what I want, but I can never get the .agg to work

Thanks


<h3>Answer1:</h3>

Here agg isn't great because pd.Series.value_counts returns a Series and to get the normalized result it requires an additional level of aggregation. Either concat the different value_counts or manually calculate the percent after the first groupby.

pd.concat([df.groupby('A').B.value_counts().rename('N'), df.groupby('A').B.value_counts(normalize=True).rename('pct')], axis=1) # or res = df.groupby('A').B.value_counts().rename('N') res = pd.concat([res, (res/res.groupby(level='A').transform('sum')).rename('pct')], axis=1) <hr /> N pct A B a 1 2 0.666667 2 1 0.333333 b 3 2 1.000000 c 4 1 1.000000

来源:https://stackoverflow.com/questions/59290735/using-pandas-agg-to-do-value-counts-twice

Recommend

  • Converting 900 MB .csv into ROOT (CERN) TTree
  • How to configure server to allow large file downloads?
  • Phonegap Filetransfer Upload image to Webservice
  • How to add a method to the versions model of Paper_trail?
  • disable EJS caching in production
  • JSON data - Group by days of the week(Sun,Mon,Tue, Wed,Thu,Fri,Sat) using Javascript/Jquery
  • Reveal Icon Button - Bootstrap 3
  • How to detect beginning of line, or: “The name 'getCharPositionInLine' does not exist in t
  • ODBC connection to an .accdb file
  • How I can specify how rainbow color scheme should be converted to grayscale
  • Arc gradients in Flutter?
  • Java 11 and E(fx)clipse JavaFX plugin on Eclipse 4.9: An error has occurred - see the log file
  • Using loops in Jasmine (with injected service)
  • android : speech recognition what are the technologies available
  • $this->a->b->c->d calling methods from a superclass in php
  • Auto send email based on the time and email address in database
  • How can I filter an array of dictionaries in 'updateSearchResultsForSearchController' to s
  • Django REST framework - HyperlinkedRelatedField with additional parameter
  • How to use AJAX to upload large CSV file? [closed]
  • Ajax call on Multiple selection in Select box
  • Creating 2d platforms using JavaScript
  • Add font awesome icon to custom add to cart button in Woocommerce 3
  • How to write seo friendly url's using htaccess?
  • VS2010 RDLC C#. How can I set a LocalReport object to a ReportViewer?
  • Is there a better way for handling SpatialPolygons that cross the antimeridian (date line)?
  • Conflicting declaration using constexpr and auto in C++11
  • Python Flask - GUI for client
  • Cross compile glibc for arm, got undefined reference to some unwind functions
  • How do I use TagLib-Sharp to write custom (PRIV) ID3 frames?
  • CAS 4 - Not able to retrieve the LDAP groups after successful authentication
  • JavaScript RegExp Replace
  • How to check if object is null in Java?