59121

How do I read in a csv with two double quotes as text qualifiers using R?

Question:

How do I read in a csv with 2 double quotes as qualifiers and a comma inside the qualifiers i.e.

""V"",""W"",""X"",""Y"",""Z"" "" "",""A "",""*B "",""C "",""D-E"" "" "",""a "",""*b "",""c,c,c"",""d e "" "" "",""E "",""*F "",""G "",""H-H""

using fread in data.table ?

Answer1:

Using data.table and fread as requested, you can do this.

The trick is to

<ol><li>fread each line in the file as a single column by setting sep='~' (or some other char that doesn't exist in the file) and setting quote='' (no quotes).</li> <li>Then, remove the double quote at the start & end of the lines</li> <li>tstrsplit() that single column into multiple columns using "","" as the pattern</li> </ol>

data.table::tstrsplit() is a handy wrapper for strsplit() that returns the rows transposed as columns.

R code:

library(data.table) # Read the file as a single column per line by picking # a sep character that doesn't exist in the file. # E.g. '~' doesn't exist in the OP's current sample data DT <- fread( "OPs_new_input_example_file.csv", sep='~', quote='', header=FALSE) DT[, V1 := gsub('""(.*)""$', "\\1",V1)] # remove "" at start and end DT <- DT[, tstrsplit(V1, '"",""') ] # strsplit and transpose rows to columns DT

Result:

V1 V2 V3 V4 V5 1: V W X Y Z 2: A *B C D-E 3: a *b c,c,c d e 4: E *F G

Please feel free to post suggestions for improvements or alternative solutions.

Recommend

  • get 'Documents' path in Matlab
  • how to calculate month difference in R
  • jquery .height() and .width() on span tag gets inconsistent results
  • Failed to update work status Exception in Python Cloud Dataflow
  • Running java programs in one runtime instance
  • Google Cloud Builder - Gradle
  • jQuery-Marquee only working in Firefox
  • Angular 2: is styleUrls relative to the current component?
  • Plot a table with R
  • Set value of radio button based on user selected options
  • How to prepare a C++ string for sql query
  • Is it possible to disable jQuery's mobile responsive design?
  • PyYaml parses '9:00' as int
  • Iterating over a container bidirectionally
  • XSD assert not recognised
  • reduce/reduce conflicts using ocamlyacc
  • Table striping rows in CSS Grid
  • ZipList with Scalaz
  • abstracting over a collection
  • Xcode 4 NSLog Macro link in Xcode 3
  • Repeat a vertical line on every page in Report Builder / SSRS
  • Android screen density dpi vs ppi
  • Regex thinks I'm nesting, but I'm not
  • What is the “return” in scheme?
  • Bug in WPF DataGrid
  • AES padding and writing the ciphertext to a disk file
  • Convert array of 8 bytes to signed long in C++
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Circular dependency while pushing http interceptor
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • Linker errors when using intrinsic function via function pointer
  • How to disable jQuery.jplayer autoplay?
  • python regex in pyparsing
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • Android Google Maps API OnLocationChanged only called once
  • File not found error Google Drive API
  • Qt: Run a script BEFORE make
  • LevelDB C iterator
  • java string with new operator and a literal
  • How can I use threading to 'tick' a timer to be accessed by other threads?