create aggregate column based on variables with R [duplicate]


This question already has an answer here:

<ul><li> <a href="/questions/14812246/calculating-statistics-on-subsets-of-data" dir="ltr" rel="nofollow">Calculating statistics on subsets of data [duplicate]</a> <span class="question-originals-answer-count"> 3 answers </span> </li> </ul>

I apologize in advanced if this is somewhat of a noob question but I looked in the forum and couldn't find a way to search what I am trying to do. I have a training set and I am trying to find a way to reduce the number of levels I have for my categorical variables (In the example below the category is the state). I would like to map the state to the mean or rate of the level. My training set would look like the following once input into a data frame:

state class mean 1 CA 1 0 2 AZ 1 0 3 NY 0 0 4 CA 0 0 5 NY 0 0 6 AZ 0 0 7 AZ 1 0 8 AZ 0 0 9 CA 0 0 10 VA 1 0

I would like the third column in my data frame to be the mean of the first column(state) based on the class variable. so the mean for CA rows will be 0.333 ... so that the mean column could be used as a replacement for the state column Is there some good way of doing this without writing an explicit loop in R?

How does one go about mapping new levels (example new states) if my training set didn't include them? Any link to approaches in R would be greatly appreciated.


This is really what the ave function was designed for. It can really be used to construct any functional result by category, but its default funciton is mean hence the name, ie, ave-(rage):

dfrm$mean <- with( dfrm, ave( class, state ) ) #FUN=mean is the default "setting"


library(plyr) join(data,ddply(data,.(state),summarise,mean=mean(class)),by=("state"),type="left")


  • KMeans dealing with categorical variable
  • Does Index Using a Logical Expression work for strings? [duplicate]
  • PHP - Setting inherited static property will also set it in other classes inheriting it
  • Word Wrap with HTML? QTabelView and Delegates
  • Put elements of a 1D vector into a 3D matrix using another matrix of positions
  • Filter Values of Current Week with XQuery
  • reorder x-axis variables by sorting a subset of the data
  • R Leaflet Legend: specify order instead of alphabetical
  • Scrolling News Ticker Jquery - Issues
  • Group variable in cobol
  • How to get file download speed (transfer rate) with php?
  • Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID
  • Using extern @class in order to add a category?
  • “A GKScore must specify a leaderboard.”
  • rapply over a nested list in R
  • .NET video play library which allows to change the playback rate?
  • How Get arguments value using inline assembly in C without Glibc?
  • How to make R's read_csv2() recognise the text characters properly
  • Implementation of State Monad
  • Alternative to overridePendingTransition() - Android
  • How do I pass the string value parameter of the selected list item from an auto-populated dropdown l
  • Is there a javascript serializer for JSON.Net?
  • Can Jackson SerializationFeature be overridden per field or class?
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • script to move all files from one location to another location
  • How to extract text from Word files using C#?
  • Where to put my custom functions in Wordpress?
  • Redux, normalised entities and lodash merge
  • Android Studio and gradle
  • Buffer size for converting unsigned long to string
  • KeystoneJS: Relationships in Admin UI not updating
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Does armcc optimizes non-volatile variables with -O0?
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • Binding checkboxes to object values in AngularJs
  • Android Heatmap on canvas or ImageView
  • Conditional In-Line CSS for IE and Others?
  • Net Present Value in Excel for Grouped Recurring CF