36198

Split an uneven character string in R with space

Question:

I read many posts on splitting strings in R. However, I am running into an error which I think is due to the way the variables were read into R i.e., space after the date in some cases because the ID is shorter. I am trying to split the character variable "VESSELID" into 2 new variables: "vesselID" and "DATE". Below is a subset of my dataset.

> dput(df) structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L, 31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L, 32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13 ", "6830 2002/08/12 ", "6830 2002/08/15 ", "105372 2002/08/23", "105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20", "104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21", "104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22", "5744 2002/08/14 ", "5744 2002/08/14 ", "5744 2002/08/14 ", "5744 2002/08/14 ", "5744 2002/08/13 ", "5744 2002/08/13 ", "5744 2002/08/13 ")), .Names = c("SETID", "VESSELID"), row.names = c(1L, 2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")

I did try the following:

library(reshape2) test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))

However, I get this error message:

Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) : unused argument(s) (split = " ")

The split command doesn't seem to be able to work properly. I don't know how to fix my character string.

Answer1:

The argument name is not split, it is pattern :

test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))

gives :

SETID VESSELID vesselID DATE 1 24153 6830 2002/08/13 6830 2002/08/13 2 24187 6830 2002/08/12 6830 2002/08/12 3 24215 6830 2002/08/15 6830 2002/08/15 10 31990 105372 2002/08/23 105372 2002/08/23 11 31990 105372 2002/08/23 105372 2002/08/23 12 31995 104234 2002/07/20 104234 2002/07/20 13 31995 104234 2002/07/20 104234 2002/07/20 14 31995 104234 2002/07/20 104234 2002/07/20 15 31996 104234 2002/07/21 104234 2002/07/21 16 31996 104234 2002/07/21 104234 2002/07/21 17 31996 104234 2002/07/21 104234 2002/07/21 18 31997 104234 2002/07/22 104234 2002/07/22 19 31997 104234 2002/07/22 104234 2002/07/22 20 32002 5744 2002/08/14 5744 2002/08/14 21 32002 5744 2002/08/14 5744 2002/08/14 22 32002 5744 2002/08/14 5744 2002/08/14 23 32002 5744 2002/08/14 5744 2002/08/14 24 32003 5744 2002/08/13 5744 2002/08/13 25 32003 5744 2002/08/13 5744 2002/08/13 26 32003 5744 2002/08/13 5744 2002/08/13

Answer2:

I would actually just use read.table on that column as below. Assuming your dataset is called "mydata":

mydata.new <- cbind(mydata[-2], read.table(text = as.character(mydata$VESSELID), strip.white=TRUE, header = FALSE)) names(mydata.new)[2:3] <- c("VesselID", "Date") mydata.new # SETID VesselID Date # 1 24153 6830 2002/08/13 # 2 24187 6830 2002/08/12 # 3 24215 6830 2002/08/15 # 10 31990 105372 2002/08/23 # 11 31990 105372 2002/08/23 # 12 31995 104234 2002/07/20 # 13 31995 104234 2002/07/20 # 14 31995 104234 2002/07/20 # 15 31996 104234 2002/07/21 # 16 31996 104234 2002/07/21 # 17 31996 104234 2002/07/21 # 18 31997 104234 2002/07/22 # 19 31997 104234 2002/07/22 # 20 32002 5744 2002/08/14 # 21 32002 5744 2002/08/14 # 22 32002 5744 2002/08/14 # 23 32002 5744 2002/08/14 # 24 32003 5744 2002/08/13 # 25 32003 5744 2002/08/13 # 26 32003 5744 2002/08/13

Answer3:

try:

do.call("rbind", strsplit(VESSELID, " "))

should return something like:

[,1] [,2] [,3] [1,] "6830" "2002/08/13" "" [2,] "6830" "2002/08/12" "" [3,] "6830" "2002/08/15" "" [4,] "105372" "2002/08/23" "105372" [5,] "105372" "2002/08/23" "105372" [6,] "104234" "2002/07/20" "104234" [7,] "104234" "2002/07/20" "104234" [8,] "104234" "2002/07/20" "104234" [9,] "104234" "2002/07/21" "104234" [10,] "104234" "2002/07/21" "104234" [11,] "104234" "2002/07/21" "104234" [12,] "104234" "2002/07/22" "104234" [13,] "104234" "2002/07/22" "104234" [14,] "5744" "2002/08/14" "" [15,] "5744" "2002/08/14" "" [16,] "5744" "2002/08/14" "" [17,] "5744" "2002/08/14" "" [18,] "5744" "2002/08/13" "" [19,] "5744" "2002/08/13" "" [20,] "5744" "2002/08/13" ""

take what you need from there

Recommend

  • npm command - Error: EISDIR: illegal operation on a directory, read
  • Metasploit msfconsole method_missing on Fedora 19
  • Regular expression breakpoint in GDB
  • Identify xml text elements with Schematron
  • Java: search words in two dimensional array
  • C++ method declaration including a macro
  • what is a good structure to save this data
  • Android: Mediaplayer stop / start playing raw resource
  • Google Maps V3 (PHP/MYSQL with custome infobox)
  • PHP users local time
  • How to implement Google Drive and Google Plus sdk in ios project
  • MFMailComposer send email without presenting view
  • Issues with converting data into a matrix after running lapply()
  • internal javascript not works in angular2
  • Getting Microsoft Calibri font on Amazon EC2 ubuntu
  • How to display content depending on dropdown menue user selection
  • Allowing audio files in Spring MVC 3.0?
  • .Net core Hosted Services guaranteed to complete
  • Java Collections.shuffle() weird behaviour [closed]
  • Set WebClient.Builder.exchangeStrategies() without losing Spring Jackson configuration
  • Annotate objects in a queryset with next and previous object ids
  • C++ STL stack pop operation giving segmentation fault
  • Neo4j…how to get a visual representation of my data?
  • How to include associated objects using gon in Rails/jQuery
  • 'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs
  • how do i compare two rows and store the similarities of the two rows in another column
  • How to clear a browser cache in Protractor
  • How do I add a mouse over tooltip to an Image using .DrawImage()
  • php “page caching” solution suggestions for CMS Applications
  • Can someone explain this Java code (formatting the output using System.out.format) to me?
  • Jersey serializes character value to ASCII equivalent numeric string
  • Angular FormGroup won't update it's value immediately after patchValue or setValue
  • PHP Permalinks.. how to change?
  • media foundation H264 decoder not working properly
  • CAS 4 - Not able to retrieve the LDAP groups after successful authentication
  • Running R's aov() mixed effects model from Python using rpy2
  • What does the “id” field in an Android “Google Play Music” broadcast intent correspond to?
  • Access to a Matlab gui from the web
  • XSLT Transformation to validate rules in XML document