Utilizzando Objective C/Cocoa di caratteri Unicode Unescape, vale a dire \ u1234

23

Non è disponibile alcuna funzione incorporata per eseguire il ripristino in sequenza.

si può imbrogliare un po 'con NSPropertyListSerialization dal momento che un plist "vecchio stile del testo" supporta C fuggire via \Uxxxx:

NSString* input = @"ab\"cA\"BC\\u2345\\u0123"; 

// will cause trouble if you have "abc\\\\uvw" 
NSString* esc1 = [input stringByReplacingOccurrencesOfString:@"\\u" withString:@"\\U"]; 
NSString* esc2 = [esc1 stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""]; 
NSString* quoted = [[@"\"" stringByAppendingString:esc2] stringByAppendingString:@"\""]; 
NSData* data = [quoted dataUsingEncoding:NSUTF8StringEncoding]; 
NSString* unesc = [NSPropertyListSerialization propertyListFromData:data 
        mutabilityOption:NSPropertyListImmutable format:NULL 
        errorDescription:NULL]; 
assert([unesc isKindOfClass:[NSString class]]); 
NSLog(@"Output = %@", unesc);

ma mente che questo non è molto efficiente. È molto meglio se scrivi il tuo parser. (BTW stai decodificando le stringhe JSON? Se sì, puoi usare the existing JSON parsers.)

fonte

2010-01-20 06:40:32 kennytm

+0

"Non c'è costruito in funzione per farlo" è quello che stavo cercando di scoprire. Ho finito col rotolare il mio, volevo solo verificare che non stavo reinventando la ruota. I parser JSON esistenti non sono abbastanza vicino a perdonare abbastanza sull'output JSON mal formato che a volte viene inviato da siti Web poco sicuri. – corydoras

+0

+1 questo è intelligente –

11

Ecco cosa ho finito per scrivere. Speriamo che questo possa aiutare alcune persone.

+ (NSString*) unescapeUnicodeString:(NSString*)string 
{ 
// unescape quotes and backwards slash 
NSString* unescapedString = [string stringByReplacingOccurrencesOfString:@"\\\"" withString:@"\""]; 
unescapedString = [unescapedString stringByReplacingOccurrencesOfString:@"\\\\" withString:@"\\"]; 

// tokenize based on unicode escape char 
NSMutableString* tokenizedString = [NSMutableString string]; 
NSScanner* scanner = [NSScanner scannerWithString:unescapedString]; 
while ([scanner isAtEnd] == NO) 
{ 
    // read up to the first unicode marker 
    // if a string has been scanned, it's a token 
    // and should be appended to the tokenized string 
    NSString* token = @""; 
    [scanner scanUpToString:@"\\u" intoString:&token]; 
    if (token != nil && token.length > 0) 
    { 
     [tokenizedString appendString:token]; 
     continue; 
    } 

    // skip two characters to get past the marker 
    // check if the range of unicode characters is 
    // beyond the end of the string (could be malformed) 
    // and if it is, move the scanner to the end 
    // and skip this token 
    NSUInteger location = [scanner scanLocation]; 
    NSInteger extra = scanner.string.length - location - 4 - 2; 
    if (extra < 0) 
    { 
     NSRange range = {location, -extra}; 
     [tokenizedString appendString:[scanner.string substringWithRange:range]]; 
     [scanner setScanLocation:location - extra]; 
     continue; 
    } 

    // move the location pas the unicode marker 
    // then read in the next 4 characters 
    location += 2; 
    NSRange range = {location, 4}; 
    token = [scanner.string substringWithRange:range]; 
    unichar codeValue = (unichar) strtol([token UTF8String], NULL, 16); 
    [tokenizedString appendString:[NSString stringWithFormat:@"%C", codeValue]]; 

    // move the scanner past the 4 characters 
    // then keep scanning 
    location += 4; 
    [scanner setScanLocation:location]; 
} 

// done 
return tokenizedString; 
} 

+ (NSString*) escapeUnicodeString:(NSString*)string 
{ 
// lastly escaped quotes and back slash 
// note that the backslash has to be escaped before the quote 
// otherwise it will end up with an extra backslash 
NSString* escapedString = [string stringByReplacingOccurrencesOfString:@"\\" withString:@"\\\\"]; 
escapedString = [escapedString stringByReplacingOccurrencesOfString:@"\"" withString:@"\\\""]; 

// convert to encoded unicode 
// do this by getting the data for the string 
// in UTF16 little endian (for network byte order) 
NSData* data = [escapedString dataUsingEncoding:NSUTF16LittleEndianStringEncoding allowLossyConversion:YES]; 
size_t bytesRead = 0; 
const char* bytes = data.bytes; 
NSMutableString* encodedString = [NSMutableString string]; 

// loop through the byte array 
// read two bytes at a time, if the bytes 
// are above a certain value they are unicode 
// otherwise the bytes are ASCII characters 
// the %C format will write the character value of bytes 
while (bytesRead < data.length) 
{ 
    uint16_t code = *((uint16_t*) &bytes[bytesRead]); 
    if (code > 0x007E) 
    { 
     [encodedString appendFormat:@"\\u%04X", code]; 
    } 
    else 
    { 
     [encodedString appendFormat:@"%C", code]; 
    } 
    bytesRead += sizeof(uint16_t); 
} 

// done 
return encodedString; 
}

fonte

2011-10-28 22:44:03 Christoph

+0

deve essere legale per uccidere lo sviluppatore lato server, solo per costringermi ad usare questa soluzione. @Christoph bel codice funzionante a proposito. Saluti! –

87

~~E 'corretto che cacao non offre una soluzione~~ , eppure Nucleo Fondazione fa: CFStringTransform.

CFStringTransform vive in un angolo polveroso e remoto di Mac OS (e iOS) e quindi è un piccolo gioiello. È il front-end del motore di trasformazione delle stringhe Apple ICU compatible. E 'possibile eseguire la vera magia come traslitterazioni tra greci e latini (o su eventuali script noti), ma può anche essere usato per fare compiti banali come unescaping stringhe da un server di merda:

NSString *input = @"\\u5404\\u500b\\u90fd"; 
NSString *convertedString = [input mutableCopy]; 

CFStringRef transform = CFSTR("Any-Hex/Java"); 
CFStringTransform((__bridge CFMutableStringRef)convertedString, NULL, transform, YES); 

NSLog(@"convertedString: %@", convertedString); 

// prints: 各個都, tada!

Come ho detto, CFStringTransform è davvero potente. Supporta un numero di trasformazioni predefinite, come le mappature dei casi, le normalizzazioni o la conversione del nome carattere unicode. Puoi persino progettare le tue trasformazioni.

~~Non ho idea del motivo per cui Apple non lo rende disponibile da Cocoa.~~

Edit 2015:

OS X e iOS 10.11 9 aggiungere il seguente metodo per Fondazione:

- (nullable NSString *)stringByApplyingTransform:(NSString *)transform reverse:(BOOL)reverse;

Così l'esempio precedente diventa ...

NSString *input = @"\\u5404\\u500b\\u90fd"; 
NSString *convertedString = [input stringByApplyingTransform:@"Any-Hex/Java" 
                reverse:YES]; 

NSLog(@"convertedString: %@", convertedString);

Grazie @nschmidt per l'heads up.

fonte

2012-07-23 14:55:56

+0

Questa è una brillante funzionalità di Apple e va ben oltre questo tipo di trasformazione. – Jessedc

+0

Diciamo che ricevo una stringa come convertedString da una fonte che non posso cambiare. Puoi dirmi come posso fare per invertire il processo in modo da recuperare la stringa originale? –

+1

Come scegliere il CFSTR? – shiami

2

codice semplice:

const char *cString = [unicodeStr cStringUsingEncoding:NSUTF8StringEncoding]; 
NSString *resultStr = [NSString stringWithCString:cString encoding:NSNonLossyASCIIStringEncoding];

da: https://stackoverflow.com/a/7861345

fonte

2014-11-21 00:55:21 likid1412

+0

Ciao a tutti, mi trovo di fronte a uno strano problema, non so perché non funziona con i suggerimenti sopra riportati, qualcuno può per favore analizzare questa stringa per me? @ "ElbowWristHand_DeQuervian \ U00e2 \ U0080 \ U0099s Tenosynovitis"; In realtà è "ElbowWristHand_DeQuervian's" e ho provato tutti i metodi suggeriti sopra ma ancora non funziona, per favore suggerisci. Grazie – york

Utilizzando Objective C/Cocoa di caratteri Unicode Unescape, vale a dire \ u1234

risposta

Problemi correlati