76552

Find the names contained in each sentence (not the other way around)

Question:

My question is an extension of this one: <a href="https://stackoverflow.com/questions/31535154/how-to-extract-sentences-containing-specific-person-names-using-r" rel="nofollow">How to extract sentences containing specific person names using R</a>

I'll write the relevant part here (slightly edited for the sake of this question):

> sentences [1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin" [2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther" [3] " He studied the Scripture, especially of Paul, and Evangelical doctrine" [4] " He was present at the disputation of Leipzig (1519) as a spectator, but participated by his comments." [5] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium" toMatch <- c("Martin Luther", "Paul", "Melanchthon")

The answer provided gives the sentences that match each name:

foo<-function(Match){c(Match,sentences[grep(Match,sentences)])} > lapply(toMatch,foo) [[1]] [1] "Martin Luther" [2] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin" [3] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther" [[2]] [1] "Paul" [2] " He studied the Scripture, especially of Paul, and Evangelical doctrine" [[3]] [1] "Melanchthon" [2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21 with the help of Martin Luther" [3] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"

lapply(toMatch,foo) gives a list of toMatch elements and apply each one to the function foo, which search for matches in the sentences with grep (returning the position of the sentences vector that match): sentences[grep(Match,sentences)].

<strong>My question</strong> is, instead of returning every sentence that match the elements of the toMatch vector, how could we have a vector with every sentence and then look for the names that match each one (i.e: the other way around, I know it's a bit confusing, the output would be this):

[1] "Martin Luther" [2] "Melanchthon","Martin Luther" [3] "Paul" [4] NA #Or maybe this row doesn't exists, it's the same for me [5] "Melanchthon"

Could this be done altering the result already provided or maybe this would be easier using a different function and lapply(sentences,FUNCTION)?

Answer1:

One option would be str_extract

library(stringr) lst <- str_extract_all(sentences, paste(toMatch, collapse="|")) lst[lengths(lst)==0] <- NA lst #[[1]] #[1] "Martin Luther" #[[2]] #[1] "Melanchthon" "Martin Luther" #[[3]] #[1] "Paul" #[[4]] #[1] NA #[[5]] #[1] "Melanchthon" <hr />

Or we can use regmatches/gregexpr from base R

lst <- regmatches(sentences, gregexpr(paste(toMatch, collapse="|"), sentences))

and replace the list elements having 0 length as NA (as before)

Recommend

  • Using count in where clause : invalid use of group function
  • concat() in XPath
  • Find the names contained in each sentence cycling through a large vector of names
  • Algorithm to check if two unsorted integer arrays have the same elements?
  • Overriding the paint() method
  • Error with a column dataframe with R
  • Why Inner Class in Scala is bound to Outer Class Object [closed]
  • How do I convert this Access Query to mySQL query?
  • gmaps4rails with Google Maps searchbox
  • Will Autoupdate Startup task work in azure application?
  • Calling a subclass constructor from a superclass
  • Ejabberd clustering chat not working
  • Time Complexity of Nested For Loop with If
  • Error Code MySQL Workbench: 1215 Cannot add foreign key constraint
  • I just want to know how many events a socket.io can handle?
  • Error in regex formulation for web scraping in python
  • Is there a standard action that inserts text, that can be used in a key binding?
  • pthreads and signal handling C ending early
  • Running multiple while true loops independently in python
  • Executing Queries in MongoDB with Greek Characters using Javascript Returns No Results
  • Create a top down parser based on a custom language
  • Passing my compar function to std::multiset with C++11
  • Octave: LaTeX tics
  • Traversing a multi-dimensional array
  • Upgrading WP8 to Silverlight WP8.1, payload contains two or more files with same destination
  • Hoisting and variable scope
  • Scatter plot with factor on horizontal axis
  • Intellisense cannot infer type from extention method
  • Find 4 minimal values in 4 __m256d registers
  • Meteor throws throwIfSelectorIsNotId exception
  • Init child with Parent instance
  • how to post with curl to REST/JSON service?
  • Max of several columns
  • NUnit 3.0 TestCase const custom object arguments
  • What does 'Language neutral' mean with regard to MAKELANGID?
  • Android activity accessing service's static reference before the service is ready
  • Switching to Release Build causes runtime error in Web Reference
  • formatting the colorbar ticklabels with SymLogNorm normalization in matplotlib
  • Incrementing object id automatically JS constructor (static method and variable)
  • Symfony2: How to get request parameter