2010-04-16 6 views

risposta

11

Secondo lo user contributed notes nel manuale, html_entity_decode() sembra infatti convertire solo alcune 100 entità anziché le oltre 200 esistenti nelle specifiche.

C'è uno script denominato html_entity_decode_full che sembra fornire la decodifica per tutte le entità, ma il sito dell'autore originale è inattivo. Here è un collegamento a un repository Drupal che contiene la funzione. Questo convertirà tutte le entità:

/* Test call */ 
echo decode_entities_full("– THIS IS A DASH", ENT_COMPAT, "utf-8"); 

/** 
* Helper function for drupal_html_to_text(). 
* 
* Calls helper function for HTML 4 entity decoding. 
* Per: http://www.lazycat.org/software/html_entity_decode_full.phps 
*/ 
function decode_entities_full($string, $quotes = ENT_COMPAT, $charset = 'ISO-8859-1') { 
    return html_entity_decode(preg_replace_callback('/&([a-zA-Z][a-zA-Z0-9]+);/', 'convert_entity', $string), $quotes, $charset); 
} 

/** 
* Helper function for decode_entities_full(). 
* 
* This contains the full HTML 4 Recommendation listing of entities, so the default to discard 
* entities not in the table is generally good. Pass false to the second argument to return 
* the faulty entity unmodified, if you're ill or something. 
* Per: http://www.lazycat.org/software/html_entity_decode_full.phps 
*/ 
function convert_entity($matches, $destroy = true) { 
    static $table = array('quot' => '"','amp' => '&','lt' => '<','gt' => '>','OElig' => 'Œ','oelig' => 'œ','Scaron' => 'Š','scaron' => 'š','Yuml' => 'Ÿ','circ' => 'ˆ','tilde' => '˜','ensp' => ' ','emsp' => ' ','thinsp' => ' ','zwnj' => '‌','zwj' => '‍','lrm' => '‎','rlm' => '‏','ndash' => '–','mdash' => '—','lsquo' => '‘','rsquo' => '’','sbquo' => '‚','ldquo' => '“','rdquo' => '”','bdquo' => '„','dagger' => '†','Dagger' => '‡','permil' => '‰','lsaquo' => '‹','rsaquo' => '›','euro' => '€','fnof' => 'ƒ','Alpha' => 'Α','Beta' => 'Β','Gamma' => 'Γ','Delta' => 'Δ','Epsilon' => 'Ε','Zeta' => 'Ζ','Eta' => 'Η','Theta' => 'Θ','Iota' => 'Ι','Kappa' => 'Κ','Lambda' => 'Λ','Mu' => 'Μ','Nu' => 'Ν','Xi' => 'Ξ','Omicron' => 'Ο','Pi' => 'Π','Rho' => 'Ρ','Sigma' => 'Σ','Tau' => 'Τ','Upsilon' => 'Υ','Phi' => 'Φ','Chi' => 'Χ','Psi' => 'Ψ','Omega' => 'Ω','alpha' => 'α','beta' => 'β','gamma' => 'γ','delta' => 'δ','epsilon' => 'ε','zeta' => 'ζ','eta' => 'η','theta' => 'θ','iota' => 'ι','kappa' => 'κ','lambda' => 'λ','mu' => 'μ','nu' => 'ν','xi' => 'ξ','omicron' => 'ο','pi' => 'π','rho' => 'ρ','sigmaf' => 'ς','sigma' => 'σ','tau' => 'τ','upsilon' => 'υ','phi' => 'φ','chi' => 'χ','psi' => 'ψ','omega' => 'ω','thetasym' => 'ϑ','upsih' => 'ϒ','piv' => 'ϖ','bull' => '•','hellip' => '…','prime' => '′','Prime' => '″','oline' => '‾','frasl' => '⁄','weierp' => '℘','image' => 'ℑ','real' => 'ℜ','trade' => '™','alefsym' => 'ℵ','larr' => '←','uarr' => '↑','rarr' => '→','darr' => '↓','harr' => '↔','crarr' => '↵','lArr' => '⇐','uArr' => '⇑','rArr' => '⇒','dArr' => '⇓','hArr' => '⇔','forall' => '∀','part' => '∂','exist' => '∃','empty' => '∅','nabla' => '∇','isin' => '∈','notin' => '∉','ni' => '∋','prod' => '∏','sum' => '∑','minus' => '−','lowast' => '∗','radic' => '√','prop' => '∝','infin' => '∞','ang' => '∠','and' => '∧','or' => '∨','cap' => '∩','cup' => '∪','int' => '∫','there4' => '∴','sim' => '∼','cong' => '≅','asymp' => '≈','ne' => '≠','equiv' => '≡','le' => '≤','ge' => '≥','sub' => '⊂','sup' => '⊃','nsub' => '⊄','sube' => '⊆','supe' => '⊇','oplus' => '⊕','otimes' => '⊗','perp' => '⊥','sdot' => '⋅','lceil' => '⌈','rceil' => '⌉','lfloor' => '⌊','rfloor' => '⌋','lang' => '〈','rang' => '〉','loz' => '◊','spades' => '♠','clubs' => '♣','hearts' => '♥','diams' => '♦','nbsp' => ' ','iexcl' => '¡','cent' => '¢','pound' => '£','curren' => '¤','yen' => '¥','brvbar' => '¦','sect' => '§','uml' => '¨','copy' => '©','ordf' => 'ª','laquo' => '«','not' => '¬','shy' => '­','reg' => '®','macr' => '¯','deg' => '°','plusmn' => '±','sup2' => '²','sup3' => '³','acute' => '´','micro' => 'µ','para' => '¶','middot' => '·','cedil' => '¸','sup1' => '¹','ordm' => 'º','raquo' => '»','frac14' => '¼','frac12' => '½','frac34' => '¾','iquest' => '¿','Agrave' => 'À','Aacute' => 'Á','Acirc' => 'Â','Atilde' => 'Ã','Auml' => 'Ä','Aring' => 'Å','AElig' => 'Æ','Ccedil' => 'Ç','Egrave' => 'È','Eacute' => 'É','Ecirc' => 'Ê','Euml' => 'Ë','Igrave' => 'Ì','Iacute' => 'Í','Icirc' => 'Î','Iuml' => 'Ï','ETH' => 'Ð','Ntilde' => 'Ñ','Ograve' => 'Ò','Oacute' => 'Ó','Ocirc' => 'Ô','Otilde' => 'Õ','Ouml' => 'Ö','times' => '×','Oslash' => 'Ø','Ugrave' => 'Ù','Uacute' => 'Ú','Ucirc' => 'Û','Uuml' => 'Ü','Yacute' => 'Ý','THORN' => 'Þ','szlig' => 'ß','agrave' => 'à','aacute' => 'á','acirc' => 'â','atilde' => 'ã','auml' => 'ä','aring' => 'å','aelig' => 'æ','ccedil' => 'ç','egrave' => 'è','eacute' => 'é','ecirc' => 'ê','euml' => 'ë','igrave' => 'ì','iacute' => 'í','icirc' => 'î','iuml' => 'ï','eth' => 'ð','ntilde' => 'ñ','ograve' => 'ò','oacute' => 'ó','ocirc' => 'ô','otilde' => 'õ','ouml' => 'ö','divide' => '÷','oslash' => 'ø','ugrave' => 'ù','uacute' => 'ú','ucirc' => 'û','uuml' => 'ü','yacute' => 'ý','thorn' => 'þ','yuml' => 'ÿ' 
         ); 
    if (isset($table[$matches[1]])) return $table[$matches[1]]; 
    // else 
    return $destroy ? '' : $matches[0]; 
} 
+0

è fantastico, grazie mille! – martin

9

O semplicemente è possibile utilizzare:

echo html_entity_decode('&', ENT_NOQUOTES, 'UTF-8'); 
4

Originariamente ho scritto the code that's pasted into Pekka's answer - è quasi completamente inutile se non in casi specifici (per lo più per la rimozione di entità inesistenti, che html_entity_decode() non tocca, o per la conversione in entità numeriche se stai facendo XML che non è codificato in UTF-8).

TinyMCE ha un'opzione per restituire l'output raw (non codificato in HTML), quindi dovrebbe essere il primo punto di chiamata. Guarda questo entry in the TinyMCE manual per maggiori informazioni su questo.

+0

sì, 'entity_encoding' è quello che stavo cercando grazie :). – martin

Problemi correlati