40500

How to read head(-er)less ASCII, identify specific rows using R?

Question:

Doing this in Excel using VBA is straight forward, however for extended data analysis, plotting capability, R has to be used. The instrument's software only outputs .csv files and without headers. It has to be read separately for analysis.

What I am trying to do (need to do):

<ul><li>Read multiple .csv without headers (4 columns). Each file can only be distinguished by name, name only.</li> <li>Pick few specific rows from each file. e.g. a row with first column value = 101, column value = 201 etc. Each file only has 500 rows. I sense there is a need for a looping somewhere here to read the specific row(?)</li> <li>Store above values into a data frame for all the files</li> <li>calculate some stats on it</li> <li>plot the stat results</li> <li>finally output data frame, stat analysis, plot into an Excel, .csv</li> </ul>

I already tried how to read .csv using R, store into a data frame, perform analysis on data frame items and plot. This process flow to be repeated daily for each instrument run. I run two instruments. I want to save a code snippet that can be run each time.

The only issue I am facing, I am not sure how to locate those specific rows. The next issue is how to go about output into Excel.

.csv input file format:

V1, V2, V3, V4 numeric (0 decimals), numeric (with 4 decimals), alpha num, alpha num 001, 12.8975, XY03, XY05 ... 485, 89.2134, XY02, XY7

Update 1: Due to some changes with the instrument sw, <a href="https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/" rel="nofollow">files are in ASCII format</a> instead of CSV. This brings much limitations. Facing issues installing raster package to support ASCII. Using the following instead. Installed sqldf package to support sql. However that gives an error.

> fl1 <- read.table("~/rcodes/dt07042017/fileone.ASC", sep=",") --read ok > sv1 <- sqldf('select FNAME "F1", V2, V3, V4 from fl1 where V1 in (101, 201, 301)') Warning message: Quoted identifiers should have class SQL, use DBI::SQL() if the caller performs the quoting.

Then installed DBI package. Not sure what it does. The sql worked. This is just one file, I need to read 1000s of files, picked specific row values and saved them into one table/df. Wrote the following broken lines. It fails as I can't find a way to inject file name.

> df <- NULL > fn <- NULL > n <- 1 > for (f in files) { + fn <- file_path_sans_ext(f) + df0 <-read.table(f, sep=",") + n <- n + 1 + df <- sqldf('select n, fn, V1, V2 from df0 where V1 IN (101, 201, 301, 401)') -- thought R could read fn just as df0 + df <- rbind(df) --further fn and n only reads two files + } Error in rsqlite_send_query(conn@ptr, statement) : no such column: n, df1

I am not <strong>quite happy</strong> using sql on R. Is there any other way in R to do this without using sql packages and get above done?

<strong>UPDATE 3</strong>

Instead of getting 2:20 values (18 values) in that column, <strong>I want to get the mean of row 2 to 20. How can I do that?</strong>

#set working directly to the folder where csv files are located files <- list.files(pattern='.csv') m = data.frame() for (k in 1:length(files)){ + csv = read.csv(files[k], header = FALSE) #picking up 2:20 consecutive values, value for row 50,120,150 so on + data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)] #-pivot transform col/row- data <- as.data.frame(t(data)) #but that line screwed up the data #when those selected values are with NA/blanks data$file = files[k] + m = rbind(m, data) }

Answer1:

I don't quite get what you mean by "ASCII files". Hope that the following is significantly similar to your problem. Say that you have the following files:

<strong>File 1.csv</strong>

101 0.8117 AFWSK QSZDP 099 0.4594 MDKKK DHVAH 301 0.8730 HMRCR PQSMB 012 0.4705 OHHZL CIKVM 012 0.1335 KKEHQ YFNJU 201 0.2967 XFIPZ PMNYT 301 0.2470 WKLIV TBHZP 101 0.9182 SSEHT QYBUW 101 0.8076 VKRSO BUGJL 301 0.5861 PICFN UFTWC

<strong>File 2.csv</strong>

201 0.0767 FHWLL FZKQF 301 0.3176 HDBVJ EIUNJ 301 0.9606 LHODD XRYFE 301 0.8327 NKGVT LYTUJ 012 0.8073 KOBXD GBEFA 099 0.9414 ZCOCX ENFAV 101 0.3683 UTBUG RKIQU 012 0.1515 XXJZI MEWVI 012 0.0383 IUKQQ PAIFV 301 0.2304 GKIKJ GEOZT

stored in a directory called path/to/my/dir. Say that this directory contains also other files, but you know that the ones containing data are names in the [0-9]+.csv format (i.e. some number, followed by .csv). Of course you have a big number of them and their names can be different; since the question doesn't contain any hint regarding this, I'm just providing an example, hoping that you might adapt this step to your real problem.

Now:

#you get the name of all files. No manual typing. filenames<-list.files(path="path/to/my/dir",pattern="[0-9]+\\.csv",full.names=TRUE) #reading them all dataFiltered<-sapply(filenames,function(x) { y<-read.table(x) y[y[[1]] %in% c(101, 201, 301),]},simplify=FALSE) finalRes<-cbind(do.call(rbind,dataFiltered), fileOrigin=rep(names(dataFiltered),vapply(dataFiltered,nrow,1L)))

To get something like:

# V1 V2 V3 V4 fileOrigin #1 101 0.8117 AFWSK QSZDP 1.csv #2 301 0.8730 HMRCR PQSMB 1.csv #3 201 0.2967 XFIPZ PMNYT 1.csv #4 301 0.2470 WKLIV TBHZP 1.csv #5 101 0.9182 SSEHT QYBUW 1.csv #6 101 0.8076 VKRSO BUGJL 1.csv #7 301 0.5861 PICFN UFTWC 1.csv #8 201 0.0767 FHWLL FZKQF 2.csv #9 301 0.3176 HDBVJ EIUNJ 2.csv #10 301 0.9606 LHODD XRYFE 2.csv #11 301 0.8327 NKGVT LYTUJ 2.csv #12 101 0.3683 UTBUG RKIQU 2.csv #13 301 0.2304 GKIKJ GEOZT 2.csv

The above should work, but could be memory expensive. If you have a big number of files, I suggest you to read them in bunches and apply the above to each bunch. This means that you don't pass all the filenames object to sapply, but you can pass a subset of it each time till you finish.

At the end, you rbind all the finalRes together.

Recommend

  • Should I do this with tables or divs?
  • Format file date YYYYMMDD in batch
  • Tomcat log format disable the IPv6 format
  • Identifying the browser at server side in php
  • How to denormalize nested list in R?
  • psycopg2.ProgrammingError: syntax error at or near “stdin” error when trying to copy_from redshift
  • pandas parse csv with left and right quote chars
  • Plot a table with R
  • NoMethodError in Controller#action - undefined method `keynote' for #
  • How to sort a same column both in asc order and desc order
  • Warning: strpos() expects parameter 1 to be string, resource
  • SQL - Select lowest values with group by and order by?
  • Trying to string.Join an IList
  • HTML5 video only works in IE. The other browsers shows the black screen
  • MeeGo Handset Emulator not starting on Windows 7
  • C++ friend class std::vector
  • Knockout custom binding handler
  • Using $compile in a directive triggers AngularJS infinite digest error
  • How to make Twilio api Post request with the help of AFNetworking?
  • Unable to send e-mail through Java
  • didUpdatePushCredentials not get called
  • UIAlertController button function not working
  • SetWindowsHookEx does not react on media keys
  • Element.tagName for python not working
  • Graphics.CopyFromScreen [Web application] + The handle is invalid
  • How solve “Qt: Untested Windows version 10.0 detected!”
  • Install PHP intl extension on MacOS
  • Retrieve list of sent friend requests from friend_request FQL table
  • Error when parsing timestamp with pandas read_csv
  • NHibernate Validation Localization with S#arp Architecture
  • How can I send an e-mail from a vbs script
  • Accessing IRQ description array within a module and displaying action names
  • Unanticipated behavior
  • Comma separated Values
  • Trying to get generic when generic is not available
  • Getting Messege Twice Using IMvxMessenger
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal