39766

Map numerics to categorical values in R, based on different ranges for the numerics [duplicate]

<div class="question-status question-originals-of-duplicate">

This question already has an answer here:

    <li> Adding column which contains bin value of another column <span class="question-originals-answer-count"> 2 answers </span> </li> </ul>

    Hope my title makes sense. I have a dataframe with a column of numeric values, and I would like to use this column to create a new column whereby the numeric values are 'mapped' to different buckets based on their values. Below is some test data, as well as a rough-around-the-edges nested ifelse() approach that I am currently using to solve this problem. I am hoping to code this in a better way that doesn't involve nested ifelse() statements, since this approach doesn't scale well for many buckets:

    mydf = data.frame(strings = letters[1:10], numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3), stringsAsFactors = FALSE)

    Here is my test dataframe, and here is my nested ifelse() approach to solving my problem:

    mydf$buckets = ifelse(mydf$numerics <= 2, 0, ifelse(mydf$numerics <= 4, 1, ifelse(mydf$numerics <= 5, 2, ifelse(mydf$numerics <= 7, 3, 4))))

    What the above code does is maps values in the numeric column as follows:

      <li>all values <2 go to 0</li> <li>all values <4 go to 1</li> <li>all values <5 go to 2</li> <li>all values <7 go to 3</li> <li>all values >= 7 to go 4</li> </ul>

      this approach doesn't scale well for more than a small number of buckets. any help with this is appreciated! Thanks,

      Answer1:

      I really like using case_when in this sort of situation as already mentioned by @tictocchoc in the comments:

      suppressPackageStartupMessages(library(tidyverse))
      
      mydf = data.frame(strings = letters[1:10], 
                        numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3),
                        stringsAsFactors = FALSE)
      
      mydf %>%
        mutate(buckets = case_when(
          numerics < 2 ~0,
          numerics < 4 ~1,
          numerics < 5 ~2,    
          numerics < 7 ~3,
          numerics >= 7 ~4
        ))
      #>    strings numerics buckets
      #> 1        a      0.2       0
      #> 2        b      0.4       0
      #> 3        c      1.3       0
      #> 4        d      5.2       3
      #> 5        e      3.3       1
      #> 6        f      2.1       1
      #> 7        g      7.3       4
      #> 8        h      1.1       0
      #> 9        i      4.3       2
      #> 10       j      8.3       4
      
          

      Answer2:

      try using the findInterval function in base R:

      findInterval(mydf$numerics,c(2,4,5,7)) [1] 0 0 0 3 1 1 4 0 2 4

Recommend

  • Fill in missing rows in R
  • Difference between tilde and “by” while using aggregate function in R
  • .cs files can't be opened anymore in VS 2013 Professional Update 4
  • Get back the assembly level code from exe files?
  • Spring Data JPA - custom @Query with “@Param Date” doesn't work
  • How to vectorize a for loop in R
  • Bind events to Item ViewModel
  • Subset first n occurrences of certain value in dataframe
  • Use WPF object to 'punch' hole in another?
  • Colour specific node in igraph
  • Plot a table with R
  • Thrust filter by key value
  • R Leaflet Legend: specify order instead of alphabetical
  • Telegram bot API - Inline bot getting Error 400 while trying to answer inline query
  • Create function that can pass a parameter without making a new component
  • Alamofire and Reachability.swift not working on xCode8-beta5
  • How can we prepend rows to a react native list-view?
  • What causes the runtime difference in this trivial fortran code?
  • Calculating ratio of reciprocated ties for each node in igraph
  • WPF version of .ScaleControl?
  • blade.php method outputting it's result to the form
  • Laravel: Getting Session ID oddly truncates when using foreach
  • How to disable all widgets inside Panel or inside Composite?
  • Generate random number from custom distribution
  • D3 get axis values on zoom event
  • Reduction and collapse clauses in OMP have some confusing points
  • Create DicomImage from scratch using Dcmtk
  • How to do unit test for HttpContext.Current.Server.MapPath
  • Fill an image in a square container while keeping aspect ratio
  • Join two tables and save into third-sql
  • How to model a transition system with SPIN
  • Timeout for blocking function call, i.e., how to stop waiting for user input after X seconds?
  • ORA-29908: missing primary invocation for ancillary operator
  • retrieve vertices with no linked edge in arangodb
  • embed rChart in Markdown
  • How does Linux kernel interrupt the application?
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Converting MP3 duration time