Haskell Parsec - i messaggi di errore sono meno utili durante l'utilizzo di token personalizzati

Sto lavorando per separare le fasi di analisi e analisi di un parser. Dopo alcuni test, ho realizzato che i messaggi di errore sono meno utili quando utilizzo alcuni token diversi dai token Char di Parsec.Haskell Parsec - i messaggi di errore sono meno utili durante l'utilizzo di token personalizzati

Ecco alcuni esempi di messaggi di errore del Parsec durante l'utilizzo Char gettoni:

ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong" 
parse error at (line 1, column 7): 
unexpected "w" 
expecting space or "ok" 


ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong" 
parse error at (line 1, column 1): 
unexpected "w" 
expecting "ok" or "nop"

Quindi, stringa parser mostra ciò stringa è previsto quando trova una stringa inaspettato, e la scelta parser mostra quali sono alternative.

Ma quando io uso stessi combinatori con i miei gettoni:

ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf " 
parse error at "test" (line 1, column 1): 
unexpected end of input

In questo caso, non stampa ciò che ci si aspettava.

ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf " 
parse error at (line 1, column 1): 
unexpected (Ide "asdf","test" (line 1, column 1))

e quando uso choice, la stampa non alternative.

Mi aspetto che questo comportamento sia correlato con le funzioni combinatore e non con i token, ma sembra che io abbia torto. Come posso risolvere questo?

Ecco il codice completo lexer + parser:

Lexer:

module Lexer 
    (Token(..) 
    , TokenPos(..) 
    , tokenize 
    ) where 

import Text.ParserCombinators.Parsec hiding (token, tokens) 
import Control.Applicative ((<*), (*>), (<$>), (<*>)) 

data Token = Ide String 
      | Number String 
      | Bool String 
      | LBrack 
      | RBrack 
      | LBrace 
      | RBrace 
      | Keyword String 
    deriving (Show, Eq) 

type TokenPos = (Token, SourcePos) 

ide :: Parser TokenPos 
ide = do 
    pos <- getPosition 
    fc <- oneOf firstChar 
    r <- optionMaybe (many $ oneOf rest) 
    spaces 
    return $ flip (,) pos $ case r of 
       Nothing -> Ide [fc] 
       Just s -> Ide $ [fc] ++ s 
    where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_" 
     rest  = firstChar ++ ['0'..'9'] 

parsePos p = (,) <$> p <*> getPosition 

lbrack = parsePos $ char '[' >> return LBrack 
rbrack = parsePos $ char ']' >> return RBrack 
lbrace = parsePos $ char '{' >> return LBrace 
rbrace = parsePos $ char '}' >> return RBrace 


token = choice 
    [ ide 
    , lbrack 
    , rbrack 
    , lbrace 
    , rbrace 
    ] 

tokens = spaces *> many (token <* spaces) 

tokenize :: SourceName -> String -> Either ParseError [TokenPos] 
tokenize = runParser tokens()

Parser:

module Parser where 

import Text.Parsec as P 
import Control.Monad.Identity 
import Lexer 

parseTest :: Show a => Parsec [TokenPos]() a -> String -> IO() 
parseTest p s = 
    case tokenize "test" s of 
     Left e -> putStrLn $ show e 
     Right ts' -> P.parseTest p ts' 

tok :: Token -> ParsecT [TokenPos]() Identity Token 
tok t = token show snd test 
    where test (t', _) = case t == t' of 
          False -> Nothing 
          True -> Just t

SOLUZIONE:

Ok, dopo la risposta di fp4me e leggendo fonte Char di Parsec più attentamente, ho finito con questo:

{-# LANGUAGE FlexibleContexts #-} 
module Parser where 

import Text.Parsec as P 
import Control.Monad.Identity 
import Lexer 

parseTest :: Show a => Parsec [TokenPos]() a -> String -> IO() 
parseTest p s = 
    case tokenize "test" s of 
     Left e -> putStrLn $ show e 
     Right ts' -> P.parseTest p ts' 


type Parser a = Parsec [TokenPos]() a 

advance :: SourcePos -> t -> [TokenPos] -> SourcePos 
advance _ _ ((_, pos) : _) = pos 
advance pos _ [] = pos 

satisfy :: (TokenPos -> Bool) -> Parser Token 
satisfy f = tokenPrim show 
         advance 
         (\c -> if f c then Just (fst c) else Nothing) 

tok :: Token -> ParsecT [TokenPos]() Identity Token 
tok t = (Parser.satisfy $ (== t) . fst) <?> show t

Ora sto ottenendo stessi messaggi di errore:

ghci> Parser.parseTest (scelta [tok $ Ide "ok", tok $ IDE "NOP"]) "asdf"
parse errore alla (linea 1, colonna 1):
inaspettato ("asdf" Ide, "test" (linea 1, colonna 3))
attesa Ide "ok" o IDE "NOP"

fonte

2012-08-28 sinan

Perché vuoi separare lexing dall'analisi? Sicuramente il motivo principale per farlo è la tradizione - era più semplice scrivere un parser difficile senza i dettagli di implementazione del lexer (che era più di routine, forse solo espressioni regolari), e in linguaggi imperativi, rende più facile pensare a separare stadi.Nella bella terra di Haskell Parsec, scrivere i lexer e i parser è semplice e facile: basta unire alcune stringhe, combinarle per analizzarle - puoi quasi scrivere la definizione della tua lingua in combinatori. Inoltre, stai lavorando duramente per passare le posizioni; lascia fare a Parsec. – AndrewC

@AndrewC, potresti avere ragione. Volevo solo vedere le parti buone e cattive del separare le fasi di analisi e analisi in parsec. Ora dopo aver guardato il mio codice finale, penso che andrò con parser. (inoltre, una volta che usavo alex + felice di analizzare una grammatica basata su indentazione e lexing mi ha aiutato a generare indent + token dedent, e lasciare che il parser lavorasse su una grammatica semplificata .. separato lo stage di lexing in parsec potrebbe anche aiutare in questo tipo di situazioni) – sinan

@AndrewC, inoltre, adoro Parsec e penso che essere in grado di lavorare su diversi tipi di flussi (diversi dai flussi di caratteri) possa essere davvero utile e scrivere un lexer mi ha aiutato a capire come posso farlo. Ora so come posso lavorare sulle stringhe di byte, per esempio. – sinan

Un inizio di soluzione può essere quella di definire il tuo ch la funzione OICE nel parser, utilizzare una funzione inaspettata specifica per ignorare errore imprevisto e, infine, utilizzare l'operatore <?> per ignorare il messaggio in attesa:

mychoice [] = mzero 
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x 
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected) <?> show (x:xs) 

myUnexpected = do 
      input <- getInput 
      unexpected $ (id $ first input) 
      where 
      first [] = "eof" 
      first (x:xs) = show $ fst x

e chiamare il parser del genere:

ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf " 
parse error at (line 1, column 1): 
unexpected Ide "asdf" 
expecting [Ide "ok",Ide "nop"]

fonte

2012-08-28 23:47:50 fp4me

Grazie. Ho aggiunto il mio codice finale alla domanda. – sinan

Haskell Parsec - i messaggi di errore sono meno utili durante l'utilizzo di token personalizzati

risposta

Problemi correlati