Come posso estrarre sottostringhe da una stringa in Perl?

Considerate le seguenti stringhe:Come posso estrarre sottostringhe da una stringa in Perl?

1) Schema ID: abc-456-hu5t10 (Alta priorità) *****

2) Schema ID: FRT-78 settimo-hj542w (Balanced)

3) Schema ID: 23f-f974-nm54w (eccellente formula run) *****

e così via nel formato sopra - le parti in grassetto sono le modifiche attraverso le stringhe.

==>Immagina di avere molte stringhe di formato mostrate sopra. Desidero selezionare 3 sottostringhe (come mostrato in BOLD in basso) da ciascuna delle stringhe sopra.

prima stringa contenente il valore alfanumerico (ad esempio in sopra è "abc-456-hu5t10")
seconda stringa contenente la parola (per esempio in sopra è "alta priorità")
terza stringa contenente * (IF * è presente alla fine della stringa ELSE lasciarlo)

Come faccio a scegliere questi 3 stringhe da ogni sequenza di caratteri mostrata in precedenza? So che può essere fatto usando le espressioni regolari in Perl ... puoi aiutarmi con questo?

fonte

2009-09-18 stack_pointer is EXTINCT

Può la stringa tra parentesi a sua volta contenere annidati? –

Si potrebbe fare qualcosa di simile:

my $data = <<END; 
1) Scheme ID: abc-456-hu5t10 (High priority) * 
2) Scheme ID: frt-78f-hj542w (Balanced) 
3) Scheme ID: 23f-f974-nm54w (super formula run) * 
END 

foreach (split(/\n/,$data)) { 
    $_ =~ /Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?/ || next; 
    my ($id,$word,$star) = ($1,$2,$3); 
    print "$id $word $star\n"; 
}

La cosa fondamentale è l'espressione regolare:

Scheme ID: ([a-z0-9-]+)\s+\(([^)]+)\)\s*(\*)?

che spezza come segue.

la stringa "ID Scheme:": fisso

Scheme ID:

seguito da uno o più caratteri a-z, 0-9 o -. Usiamo le staffe per catturare da $ 1:

([a-z0-9-]+)

seguito da uno o più caratteri di spaziatura:

\s+

seguita da una parentesi aperta (che fuggiamo), seguita da un numero qualsiasi di caratteri che aren È una parentesi chiusa, quindi una parentesi chiusa (sfuggita). Usiamo staffe escape per catturare le parole da $ 2:

\(([^)]+)\)

Seguito da alcuni spazi qualsiasi forse un *, catturati da $ 3:

\s*(\*)?

fonte

2009-09-18 11:47:30

(\S*)\s*\((.*?)\)\s*(\*?) 


(\S*) picks up anything which is NOT whitespace 
\s*  0 or more whitespace characters 
\(  a literal open parenthesis 
(.*?) anything, non-greedy so stops on first occurrence of... 
\)  a literal close parenthesis 
\s*  0 or more whitespace characters 
(\*?) 0 or 1 occurances of literal *

fonte

2009-09-18 11:43:46 Xetius

\ (([^)]) \) sarebbe meglio di \ ((. *?) \), Poiché è garantito che si fermi al primo). I quantificatori non grezzi possono causare un pesante backtracking che uccide le prestazioni. (Improbabile in questo caso, bisogna ammetterlo, ma evitarli quando non sono necessari è ancora una buona abitudine da coltivare.) La classe di carattere negata è anche una dichiarazione più chiara del tuo intento: stai cercando "un numero qualsiasi di non-) caratteri ", non" il più piccolo numero di qualsiasi carattere, seguito da a), che rende l'espressione nel suo insieme ". –

Si potrebbe usare un'espressione regolare come la seguente:

/([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/

Così, per esempio:

$s = "abc-456-hu5t10 (High priority) *"; 
$s =~ /([-a-z0-9]+)\s*\((.*?)\)\s*(\*)?/; 
print "$1\n$2\n$3\n";

stampe

abc-456-hu5t10 
High priority 
*

fonte

2009-09-18 11:44:23

tempo che non ci Perl

while(<STDIN>) { 
    next unless /:\s*(\S+)\s+\(([^\)]+)\)\s*(\*?)/; 
    print "|$1|$2|$3|\n"; 
}

fonte

2009-09-18 11:44:33

Stringa 1:

$input =~ /'^\S+'/; 
$s1 = $&;

String 2:

$input =~ /\(.*\)/; 
$s2 = $&;

Stringa 3:

$input =~ /\*?$/; 
$s3 = $&;

fonte

2009-09-18 11:46:32 Rap

Beh, un uno di linea qui:

perl -lne 'm|Scheme ID:\s+(.*?)\s+\((.*?)\)\s?(\*)?|g&&print "$1:$2:$3"' file.txt

Esteso a un semplice script per spiegare le cose un po 'meglio:

#!/usr/bin/perl -ln    

#-w : warnings     
#-l : print newline after every print        
#-n : apply script body to stdin or files listed at commandline, dont print $_   

use strict; #always do this.  

my $regex = qr{ # precompile regex         
    Scheme\ ID:  # to match beginning of line.      
    \s+    # 1 or more whitespace        
    (.*?)   # Non greedy match of all characters up to   
    \s+    # 1 or more whitespace        
    \(    # parenthesis literal        
    (.*?)   # non-greedy match to the next      
    \)    # closing literal parenthesis      
    \s*    # 0 or more whitespace (trailing * is optional)  
    (\*)?   # 0 or 1 literal *s         
}x; #x switch allows whitespace in regex to allow documentation. 

#values trapped in $1 $2 $3, so do whatever you need to:    
#Perl lets you use any characters as delimiters, i like pipes because      
#they reduce the amount of escaping when using file paths   
m|$regex| && print "$1 : $2 : $3"; 

#alternatively if(m|$regex|) {doOne($1); doTwo($2) ... }

Anche se si trattasse di qualcosa di diverso dalla formattazione, implementerei un ciclo principale per gestire i file e dare corpo al testo piuttosto che affidarmi agli switch della linea di comando per il ciclo.

fonte

2009-09-18 18:29:41 liam

Questo richiede solo un piccolo cambiamento ai miei last answer:

parentesi

my ($guid, $scheme, $star) = $line =~ m{ 
    The [ ] Scheme [ ] GUID: [ ] 
    ([a-zA-Z0-9-]+)   #capture the guid 
    [ ] 
    \( (.+) \)    #capture the scheme 
    (?: 
     [ ] 
     ([*])    #capture the star 
    )?      #if it exists 
}x;

fonte

2009-09-19 00:11:13

Come posso estrarre sottostringhe da una stringa in Perl?

risposta

Problemi correlati