64373

Find pandas quartiles based on another column

Question:

I have a dataframe:

Av_Temp Tot_Precip 278.001 0 274 0.0751864 270.294 0.631634 271.526 0.229285 272.246 0.0652201 273 0.0840059 270.463 0.0602944 269.983 0.103563 268.774 0.0694555 269.529 0.010908 270.062 0.043915 271.982 0.0295718

I want to find the percentile values (25%, 50%, 75%) for the column: 'Tot_Precip' for each decile (top 10%, next 10% ...) of values from the column: Av_Temp. Currently, I am doing this:

import numpy, pandas, pdb expl_var = 'Av_Temp' cname = 'Tot_Precip' num_samples = 10.0 max_val = df[expl_var].max() min_val = df[expl_var].min() expl_bins = numpy.linspace(min_val, max_val, num = num_samples) for index, val in enumerate(expl_bins): print index if index < (len(expl_bins) - 1): cur_val = val nxt_val = expl_bins[index+1] # Subset dataframe to rows with values of expl_var between # cur_val and nxt_val sub_ind_df = df[(df[expl_var] >= cur_val) & (df[expl_var] <= nxt_val)] sub_ind_df[cname+'_quartiles'] = pandas.qcut(sub_ind_df[cname], 4) # Merge with sub_df pdb.set_trace()

Not sure how to proceed after this.

The answer could be something like:

Av_Temp_decile Tot_Precip_25 Tot_Precip_50 Tot_Precip_75 270 - 272 0.03 0.05 0.08

Answer1:

I'm only splitting you data into halves rather than deciles here due to the small example dataset, but everything should work the same if you just increase the number of bins in the initial cut:

# Change this to 10 to get deciles df['Temp_Halves'] = pd.qcut(df['Av_Temp'], 2) def get_quartiles(group): # Add retbins=True to get the bin edges qs, bins = pd.qcut(group['Tot_Precip'], [.25, .5, .75], retbins=True) # Returning a series from a function means groupby.apply() will # expand it into separate columns return pd.Series(bins, index=['Precip_25', 'Precip_50', 'Precip_75'] df.groupby('Temp_Halves').apply(get_quartiles) Out[21]: Precip_25 Precip_50 Precip_75 Temp_Halves [268.774, 270.995] 0.048010 0.064875 0.095036 (270.995, 278.001] 0.038484 0.070203 0.081801

Recommend

  • Visual Studio 2010 - 2015 does not use ymm* registers for AVX optimization
  • How do references in functions work?
  • Exporting SAS DataSet on to UNIX as a text file…with delimiter '~|~'
  • Parsing Data From Long to Wide Format in Python
  • R Impute NA's by Linear Increase Depending on Time Interval
  • Using : for multiple slicing in list or numpy array
  • Fraction length
  • How can I count unique terms in a plaintext file case-insensitively?
  • Programmatically Update Linked Named Range of excel object in MS Word (2007)
  • iOS 6 dateFromString returns wrong date
  • Accessing Rows In A LINQ Result Without A Foreach Loop?
  • Group list of tuples by item
  • IE11 throwing “SCRIPT1014: invalid character” where all other browsers work
  • RxJava debounce by arbitrary value
  • D3 get axis values on zoom event
  • Copy to all folders batch file?
  • Breaking out column by groups in Pandas
  • C: Incompatible pointer type initializing
  • Remove final comma from string in vb.net
  • copying resource to sdcard gives a damaged file in android
  • jQuery .attr() and value
  • R - Combining Columns to String Based on Logical Match
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • MySQL WHERE-condition in procedure ignored
  • Do I've to free mysql result after storing it?
  • How to get next/previous record number?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • align graphs with different xlab
  • Return words with double consecutive letters
  • Proper way to use connect-multiparty with express.js?
  • apache spark aggregate function using min value
  • reshape alternating columns in less time and using less memory
  • costura.fody for a dll that references another dll
  • Reading document lines to the user (python)
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?
  • Python/Django TangoWithDjango Models and Databases
  • java string with new operator and a literal
  • Net Present Value in Excel for Grouped Recurring CF