Estrarre più istanze di un motivo da una stringa in R

Ho un vettore di carattere t come segue.Estrarre più istanze di un motivo da una stringa in R

t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
    "GID895 GID895 K350")

Vorrei estrarre tutte le stringhe che iniziano con GID e seguite da una sequenza di cifre.

Questo funziona, ma non recupera più istanze.

gsub(".*(GID\\d+).*", "\\1", t) 
[1] "GID456" "GID667" "GID2345" "GID895"

Come estrarre tutte le stringhe in questo caso? L'uscita desiderata è la seguente

out <- c("GID456", "GID456", "GID667", "GID45345", "GID2345", 
     "GID895", "GID895")

fonte

2015-05-12 Crops

Ecco un approccio utilizzando un pacchetto mantengo qdapRegex (preferisco questo o stringi/stringr) basare la coerenza e facilità d'uso. Ho anche mostrato un approccio di base. In ogni caso, considererei questo problema più come un problema di "estrazione" che un problema di sottotitoli.

y <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
    "GID895 GID895 K350") 

library(qdapRegex) 
unlist(ex_default(y, pattern = "GID\\d+")) 

## [1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"

In Base R:

unlist(regmatches(y, gregexpr("GID\\d+", y)))

fonte

2015-05-12 05:18:26

Ho usato str_split funzione dal stringr pacchetto

library(stringr) 
word.list = str_split(t, '\\s+') 
new_list <- unlist(word.list) 
new_list[grep("GID", new_list)]

Spero che questo aiuta.

fonte

2015-05-12 06:15:00

Attraverso gsub

> t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
+  "GID895 GID895 K350") 
> unlist(strsplit(gsub("(GID\\d+)|.", "\\1 ", t), "\\s+")) 
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345" 
[6] "GID895" "GID895"

fonte

2015-05-12 06:17:11

Estrarre più istanze di un motivo da una stringa in R

risposta

Problemi correlati