52339

The as.numeric function changes the values in my dataframe [duplicate]

<div class="question-status question-originals-of-duplicate">

This question already has an answer here:

    <li> How to convert a factor to integer\numeric without loss of information? <span class="question-originals-answer-count"> 5 answers </span> </li> </ul>

    I have a column containing speed measurements which I need to change to numeric so that I can use both the mean and sum functions. However, when I do convert them the values change substantially.

    Why is this?

    This is what my data look like at first:

    <img src="https://i.stack.imgur.com/BBit4.png" alt="enter image description here">

    And here is the structure of the data frame:

    'data.frame': 1899571 obs. of 20 variables: $ pcd : Factor w/ 1736958 levels "AB101AA","AB101AB",..: 1 2 3 4 5 6 6 7 7 8 $ pcdstatus : Factor w/ 5 levels "Insufficient Data",..: 4 4 4 4 4 2 3 2 3 3 ... $ mbps2 : Factor w/ 3 levels "N","N/A","Y": 2 2 2 2 2 2 2 2 2 2 ... $ averagesp : Factor w/ 301 levels ">=30","0","0.2",..: 301 301 301 301 301 301 301 $ mediansp : Factor w/ 302 levels ">=30","0","0.1",..: 302 302 302 302 302 302 302 $ maxsp : Factor w/ 301 levels ">=30","0","0.2",..: 301 301 301 301 301 301 301 $ nga : Factor w/ 2 levels "N","Y": 1 2 1 1 1 1 1 2 2 2 ... $ connections: Factor w/ 119 levels "<3","0","1","10",..: 2 2 2 2 2 1 2 1 2 2 ... $ pcd2 : Factor w/ 1736958 levels "AB10 1AA","AB10 1AB",..: 1 2 3 4 5 6 6 7 7 8 $ pcds : Factor w/ 1736958 levels "AB10 1AA","AB10 1AB",..: 1 2 3 4 5 6 6 7 7 8 $ oslaua : Factor w/ 407 levels "","95A","95B",..: 374 374 374 374 374 374 374 $ x : int 394251 394232 394181 394251 394371 394181 394181 394331 394331 $ y : int 806376 806470 806429 806376 806359 806429 806429 806530 806530 $ ctry : Factor w/ 4 levels "E92000001","N92000002",..: 3 3 3 3 3 3 3 3 3 3 ... $ hro2 : Factor w/ 13 levels "","E12000001",..: 12 12 12 12 12 12 12 12 12 12 $ soa1 : Factor w/ 34381 levels "","E01000001",..: 32485 32485 32485 32485 $ dzone1 : Factor w/ 6507 levels "","E99999999",..: 128 128 128 128 112 128 128 $ soa2 : Factor w/ 7197 levels "","E02000001",..: 6784 6784 6784 6784 6784 6784 $ urindew : int 9 9 9 9 9 9 9 9 9 9 ... $ soa1ni : Factor w/ 892 levels "","95AA01S1",..: 892 892 892 892 892 892 892 892

    This is the code for converting my variables to numeric variables.

    #convert individual columns to numeric variables total$averagesp <- as.numeric(total$averagesp) total$mediansp <- as.numeric(total$mediansp) total$maxsp <- as.numeric(total$maxsp) total$mbps2 <- as.numeric(total$mbps2) total$nga <- as.numeric(total$nga) total$connections <- as.numeric(total$connections)

    But I have this strange output afterwards where all my data have been inflated:

    <img src="https://i.stack.imgur.com/5u4fC.png" alt="enter image description here">

    Any help would be much appreciated - thank you!

    Answer1:

    See FAQ 7.10. Basically when you use as.numeric on a factor then you get the underlying integers. The FAQ has the recipes for turning them into the numbers represented by the strings.

Recommend

  • jqgrid: issue with scrolling subgrid
  • R: Cleaning up a wide and untidy dataframe
  • overlapping the predicted time series on the original series in R
  • What is the benefit of update instead of doing delete and then Insert in the same table
  • ImageMapper Beatles demo won't run outside jsFiddle
  • extract unique combinations of subset of parameters from tidy data
  • CultureInfo.InvariantCulture in plain english
  • ddply run in a function looks in the environment outside the function?
  • Use of Java generics could be hanging the compiler
  • how to make pdf page to fit to screen size in vfr reader
  • r data.table usage in function call
  • Reorder stacks in horizontal stacked barplot (R)
  • Caching across Applications in .Net on a Windows Machine
  • Kafka reassignment of __consumer_offsets incorrect?
  • How do I scale the y-axis on a histogram by the x values in R?
  • Pass … argument to another function
  • Delete query generating UncategorizedSQLException and ORACLE memory issue in SPRING framework
  • R add index column to data frame based on row values
  • R rbind - unexpected symbol error when merging rows from two data frames
  • Correct way to set color to transparent with matplotlib.pcolormesh()?
  • Apply kurtosis to a distribution in python
  • Get Quantile values from geom_boxplot()
  • MongoDB query comments along with user information
  • Playing a monetized YouTube song inside of a Google Chrome Extension. Do I have any options?
  • How can I include multiple models in one view for in a Joomla 3.x component built with Component Cre
  • How to execute 2 Observables in parallel, ignoring their results and execute next Observable
  • Group variable in cobol
  • Oracle ListaGG, Top 3 most frequent values, given in one column, grouped by ID
  • rapply over a nested list in R
  • Geom_jitter colour based on values
  • Shouldn't else be indented in the below code
  • Replace value with Factor in r data.table
  • How to extract text from Word files using C#?
  • embed rChart in Markdown
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Does armcc optimizes non-volatile variables with -O0?
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Conditional In-Line CSS for IE and Others?