42998

R: gsub and capture

I am trying to extract the contents between square brackets from a string:

eq <- "(5) h[m] + nadh[m] + q10[m] --> (4) h[c] + nad[m] + q10h2[m]"

I can filter them out:

gsub("\\[.+?\\]","" ,eq) ##replaces square brackets and everything inside it [1] "(5) h + nadh + q10 --> (4) h + nad + q10h2"

But how can I capture what's inside the brackets? I tried the following:

gsub("\\[(.+)?\\])", "\\1", eq) grep("\\[(.+)?\\]", eq, value=TRUE)

but both return me the whole string:

[1] "(5) h[m] + nadh[m] + q10[m] --> (4) h[c] + nad[m] + q10h2[m]"

Also, in my application I never know how many such terms in square brackets occur, so I wouldn't know how the 'replace' argument in gsub should look like (e.g. \\1 or \\1_\\2). Thanks in advance!

Answer1:

Try this:

eq <- "(5) h[m] + nadh[m] + q10[m] --> (4) h[c] + nad[m] + q10h2[m]" pattern<-"\\[.+?\\]" m <- gregexpr(pattern, eq) regmatches(eq, m) [[1]] [1] "[m]" "[m]" "[m]" "[c]" "[m]" "[m]"

Your first pattern didn't work because of an extra bracket that was never found:

gsub("\\[(.+)?\\])", "\\1", eq) # Yours gsub("\\[(.+?)\\]", "\\1", eq) # Corrected -- kind of [1] "(5) hm + nadhm + q10m --> (4) hc + nadm + q10h2m"

What you essentially are doing is replacing every instance of your match with your first bracketed part, which isn't what you want.

Your second pattern, using grep, simply searched the string for the pattern, found it, and then returned all strings that had the pattern, which was your one string.

Answer2:

Another option :

library(stringr) pattern<-"\\[.+?\\]" str_extract_all(eq,pattern) [[1]] [1] "[m]" "[m]" "[m]" "[c]" "[m]" "[m]"

Answer3:

gsub replaces portions of a string with replacement strings but here we wish to extract the strings rather than replace them.

<strong>strapplyc</strong> strapplyc in the gsubfn package can do that. Use your pattern but insert parentheses around the portion you wish to capture (or omit the parentheses if you wish to capture the entire pattern including the square brackets):

> library(gsubfn) > strapplyc(eq, "\\[(.*?)\\]")[[1]] [1] "m" "m" "m" "c" "m" "m"

The guts of strapplyc is written in tcl so its quite fast although for small strings such as the ones here the speed will not really matter.

<strong>strapply</strong> There also exists strapply which takes a third argument that is a function, list or proto object that is applied to each extracted capture. e.g.

> # function > strapply(eq, "\\[(.*?)\\]", toupper)[[1]] [1] "M" "M" "M" "C" "M" "M" > # list > strapply(eq, "\\[(.*?)\\]", list(c = "crunchy", m = "munchy"))[[1]] [1] "munchy" "munchy" "munchy" "crunchy" "munchy" "munchy"

Recommend

  • Regular expression to find function calls in a function body
  • plot function type
  • Match letters in R regex
  • Visual Studio 2013 Browser Link not playing well with Firefox - throwing exception
  • azure web job “The job is missing basic auth fields”
  • AutoHotkey choking on same-line curly brace for compound if statements
  • Controlling tab space in a using CSS?
  • “class template has already been declared as a non-class template”
  • dot vs bracket notation in jquery method chaining
  • console.log printing statements in the wrong order for learnyounode node.js tutorial
  • DML and Exception Handling - Oracle
  • Extract data between rows r
  • Where in the relevant specification is it documented that some comments in a SQL script are, in fact
  • Use sed with regex and (
  • PayPal API Listener Website Payments Standard URI
  • Hash Code in SQL Server?
  • jQuery: add elements until a particular height is reached
  • ThreadStatic in asynchronous ASP.NET Web API
  • How to access culture data in globalize.js V1.0.0
  • SAXReader not re-ecape characters
  • Why use database factory in asp.net mvc?
  • Spring boot 2.0.0.M4 required a bean named 'entityManagerFactory' that could not be found
  • List images(01.png) and descriptions(01.txt) from directory
  • Is it possible to open regedit and navigate to straight to a specific key using process.start?
  • Reduction and collapse clauses in OMP have some confusing points
  • d3 v4 drag and drop with TypeScript
  • Why does access(2) check for real and not effective UID?
  • Checking free space on FTP server
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Projection media query: browser support and workarounds?
  • Illegal mix of collations for operation for date/time comparison
  • WinForms: two way TextBox problem
  • Do create extension work in single-user mode in postgres?
  • AT Commands to Send SMS not working in Windows 8.1
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Comma separated Values
  • Benchmarking RAM performance - UWP and C#
  • What are the advantages and disadvantages of reading an entire file into a single String as opposed
  • Net Present Value in Excel for Grouped Recurring CF
  • How to load view controller without button in storyboard?