Estrarre tutte le parole tra due parole specifiche in un vettore di caratteri

Esiste un metodo più efficiente? Come posso fare questo senza stringr?Estrarre tutte le parole tra due parole specifiche in un vettore di caratteri

txt <- "I want to extract the words between this and that, this goes with that, this is a long way from that" 

library(stringr) 
w_start <- "this" 
w_end <- "that" 
pattern <- paste0(w_start, "(.*?)", w_end) 
wordsbetween <- unlist(str_extract_all(txt, pattern)) 
gsub("^\\s+|\\s+$", "", str_sub(wordsbetween, nchar(w_start)+1, -nchar(w_end)-1)) 
[1] "and"    "goes with"   "is a long way from"

fonte

2013-04-23 Ben

Questo è un approccio che uso nel qdap:

Utilizzando qdap:

library(qdap) 
genXtract(txt, "this", "that") 

## > genXtract(txt, "this", "that") 
##   this : that1   this : that2   this : that3 
##    " and "   " goes with " " is a long way from "

Senza un add on pacchetto:

regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE)) 

## > regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE)) 
## [[1]] 
## [1] " and "    " goes with "   " is a long way from "

fonte

2013-04-23 05:32:42

Grazie, ho pensato che avresti qualcosa di simile nella manica! – Ben

Posso chiedere, per curiosità, cosa stai usando che emette il '##' prima di ogni riga? Vedo che un po 'qui a SO, ma non ho idea di cosa lo stia producendo. – Ben

Ho una funzione fatta in casa nel mio .Rprofile per inserire '##' davanti all'output e copiare negli appunti. –

Ecco un altro ruvido provare a utilizzare strsplit, sebbene possa essere ulteriormente perfezionato:

txtspl <- unlist(strsplit(gsub("[[:punct:]]","",txt),"this|that")) 
txtspl[txtspl!=" "][-1] 

#[1] " and "    " goes with "   " is a long way from "

fonte

2013-04-23 05:50:44 thelatemail

Estrarre tutte le parole tra due parole specifiche in un vettore di caratteri

risposta

Problemi correlati