How do I read in a csv with two double quotes as text qualifiers using R?


How do I read in a csv with 2 double quotes as qualifiers and a comma inside the qualifiers i.e.

""V"",""W"",""X"",""Y"",""Z"" "" "",""A "",""*B "",""C "",""D-E"" "" "",""a "",""*b "",""c,c,c"",""d e "" "" "",""E "",""*F "",""G "",""H-H""

using fread in data.table ?


Using data.table and fread as requested, you can do this.

The trick is to

<ol><li>fread each line in the file as a single column by setting sep='~' (or some other char that doesn't exist in the file) and setting quote='' (no quotes).</li> <li>Then, remove the double quote at the start & end of the lines</li> <li>tstrsplit() that single column into multiple columns using "","" as the pattern</li> </ol>

data.table::tstrsplit() is a handy wrapper for strsplit() that returns the rows transposed as columns.

R code:

library(data.table) # Read the file as a single column per line by picking # a sep character that doesn't exist in the file. # E.g. '~' doesn't exist in the OP's current sample data DT <- fread( "OPs_new_input_example_file.csv", sep='~', quote='', header=FALSE) DT[, V1 := gsub('""(.*)""$', "\\1",V1)] # remove "" at start and end DT <- DT[, tstrsplit(V1, '"",""') ] # strsplit and transpose rows to columns DT


V1 V2 V3 V4 V5 1: V W X Y Z 2: A *B C D-E 3: a *b c,c,c d e 4: E *F G

Please feel free to post suggestions for improvements or alternative solutions.


