ANTLR解决非LL(*)的问题和句法谓词

问题描述:

考虑在解析器以下规则:ANTLR解决非LL(*)的问题和句法谓词

expression 
    : IDENTIFIER 
    | (...) 
    | procedure_call // e.g. (foo 1 2 3) 
    | macro_use // e.g. (xyz (some datum)) 
    ; 

procedure_call 
    : '(' expression expression* ')' 
    ; 

macro_use 
    : '(' IDENTIFIER datum* ')' 
    ; 

// Note that any string that parses as an <expression> will also parse as a <datum>. 
datum 
    : simple_datum 
    | compound_datum 
    ; 

simple_datum 
    : BOOLEAN 
    | NUMBER 
    | CHARACTER 
    | STRING 
    | IDENTIFIER 
    ; 

compound_datum 
    : list 
    | vector 
    ; 

list 
    : '(' (datum+ ('.' datum)?)? ')' 
    | ABBREV_PREFIX datum 
    ; 

fragment ABBREV_PREFIX 
    : ('\'' | '`' | ',' | ',@') 
    ; 

vector 
    : '#(' datum* ')' 
    ; 

在表达式规则的procedure_call和macro_rule替代产生一个非LL(*)结构错误。我可以看到问题,因为(IDENTIFIER)将解析为两者。但即使我使用+而不是*来定义它们,它也会产生错误,即使上面的例子不应该再解析。
我想出了句法谓词的用法,但我无法弄清楚如何使用它们来完成这里的诀窍。

expression 
    : IDENTIFIER 
    | (...) 
    | (procedure_call)=>procedure_call // e.g. (foo 1 2 3) 
    | macro_use // e.g. (xyz (some datum)) 
    ; 


expression 
    : IDENTIFIER 
    | (...) 
    | ('(' IDENTIFIER expression)=>procedure_call // e.g. (foo 1 2 3) 
    | macro_use // e.g. (xyz (some datum)) 
    ; 

不工作,因为相关没有,但第一条规则将匹配任何东西。有没有一种合适的方法来解决这个问题?

+0

如果我有时间,我会仔细看看这个。到目前为止,有一件事情出来了:你在分析器规则中使用'fragment'规则是非法的。只有词法分析规则可以使用“片段”规则。具体来说:'ABBREV_PREFIX'不能在list中使用。 – 2011-06-14 14:28:53

+0

expression和datam都可以匹配IDENTIFIER,因此如果在这些规则中将*更改为+,procedure_call和macro_use都可以匹配'(IDENTIFIER IDENTIFIER)'。第一个句法谓词修饰应该解决这个问题。什么是坏的? – 2011-06-14 16:11:42

+0

使用句法谓词'(...)=>',这是去这里的路。你可能没有充分使用它们(如果没有看到更多的语法,就不能确定)。无论如何,看看我的答案。 – 2011-06-14 19:35:34

我发现了一个JavaCC grammar of R5RS这是我以前写的ANTLR当量(快!):

/* 
* Copyright (C) 2011 by Bart Kiers, based on the work done by Håkan L. Younes' 
* JavaCC R5RS grammar, available at: http://mindprod.com/javacc/R5RS.jj 
* 
* Permission is hereby granted, free of charge, to any person obtaining a copy 
* of this software and associated documentation files (the "Software"), to deal 
* in the Software without restriction, including without limitation the rights 
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
* copies of the Software, and to permit persons to whom the Software is 
* furnished to do so, subject to the following conditions: 
* 
* The above copyright notice and this permission notice shall be included in 
* all copies or substantial portions of the Software. 
* 
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
* THE SOFTWARE. 
*/ 
grammar R5RS; 

parse 
    : commandOrDefinition* EOF 
    ; 

commandOrDefinition 
    : (syntaxDefinition)=>    syntaxDefinition 
    | (definition)=>     definition 
    | ('(' BEGIN commandOrDefinition)=> '(' BEGIN commandOrDefinition+ ')' 
    |         command 
    ; 

syntaxDefinition 
    : '(' DEFINE_SYNTAX keyword transformerSpec ')' 
    ; 

definition 
    : '(' DEFINE (variable expression ')' 
       | '(' variable defFormals ')' body ')' 
       ) 
    | '(' BEGIN definition* ')' 
    ; 

defFormals 
    : variable* ('.' variable)? 
    ; 

keyword 
    : identifier 
    ; 

transformerSpec 
    : '(' SYNTAX_RULES '(' identifier* ')' syntaxRule* ')' 
    ; 

syntaxRule 
    : '(' pattern template ')' 
    ; 

pattern 
    : patternIdentifier 
    | '(' (pattern+ ('.' pattern | ELLIPSIS)?)? ')' 
    | '#(' (pattern+ ELLIPSIS?)? ')' 
    | patternDatum 
    ; 

patternIdentifier 
    : syntacticKeyword 
    | VARIABLE 
    ; 

patternDatum 
    : STRING 
    | CHARACTER 
    | bool 
    | number 
    ; 

template 
    : patternIdentifier 
    | '(' (templateElement+ ('.' templateElement)?)? ')' 
    | '#(' templateElement* ')' 
    | templateDatum 
    ; 

templateElement 
    : template ELLIPSIS? 
    ; 

templateDatum 
    : patternDatum 
    ; 

command 
    : expression 
    ; 

identifier 
    : syntacticKeyword 
    | variable 
    ; 

syntacticKeyword 
    : expressionKeyword 
    | ELSE 
    | ARROW 
    | DEFINE 
    | UNQUOTE 
    | UNQUOTE_SPLICING 
    ; 

expressionKeyword 
    : QUOTE 
    | LAMBDA 
    | IF 
    | SET 
    | BEGIN 
    | COND 
    | AND 
    | OR 
    | CASE 
    | LET 
    | LETSTAR 
    | LETREC 
    | DO 
    | DELAY 
    | QUASIQUOTE 
    ; 

expression 
    : (variable)=>   variable 
    | (literal)=>   literal 
    | (lambdaExpression)=> lambdaExpression 
    | (conditional)=>  conditional 
    | (assignment)=>  assignment 
    | (derivedExpression)=> derivedExpression 
    | (procedureCall)=>  procedureCall 
    | (macroUse)=>   macroUse 
    |      macroBlock 
    ; 

variable 
    : VARIABLE 
    | ELLIPSIS 
    ; 

literal 
    : quotation 
    | selfEvaluating 
    ; 

quotation 
    : '\'' datum 
    | '(' QUOTE datum ')' 
    ; 

selfEvaluating 
    : bool 
    | number 
    | CHARACTER 
    | STRING 
    ; 

lambdaExpression 
    : '(' LAMBDA formals body ')' 
    ; 

formals 
    : '(' (variable+ ('.' variable)?)? ')' 
    | variable 
    ; 

conditional 
    : '(' IF test consequent alternate? ')' 
    ; 

test 
    : expression 
    ; 

consequent 
    : expression 
    ; 

alternate 
    : expression 
    ; 

assignment 
    : '(' SET variable expression ')' 
    ; 

derivedExpression 
    : quasiquotation 
    | '(' (COND ('(' ELSE sequence ')' 
       | condClause+ ('(' ELSE sequence ')')? 
       ) 
     | CASE expression ('(' ELSE sequence ')' 
          | caseClause+ ('(' ELSE sequence ')')? 
          ) 
     | AND test* 
     | OR test* 
     | LET variable? '(' bindingSpec* ')' body 
     | LETSTAR '(' bindingSpec* ')' body 
     | LETREC '(' bindingSpec* ')' body 
     | BEGIN sequence 
     | DO '(' iterationSpec* ')' '(' test doResult? ')' command* 
     | DELAY expression 
     ) 
    ')' 
    ; 

condClause 
    : '(' test (sequence | ARROW recipient)? ')' 
    ; 

recipient 
    : expression 
    ; 

caseClause 
    : '(' '(' datum* ')' sequence ')' 
    ; 

bindingSpec 
    : '(' variable expression ')' 
    ; 

iterationSpec 
    : '(' variable init step? ')' 
    ; 

init 
    : expression 
    ; 

step 
    : expression 
    ; 

doResult 
    : sequence 
    ; 

procedureCall 
    : '(' operator operand* ')' 
    ; 

operator 
    : expression 
    ; 

operand 
    : expression 
    ; 

macroUse 
    : '(' keyword datum* ')' 
    ; 

macroBlock 
    : '(' (LET_SYNTAX | LETREC_SYNTAX) '(' syntaxSpec* ')' body ')' 
    ; 

syntaxSpec 
    : '(' keyword transformerSpec ')' 
    ; 

body 
    : ((definition)=> definition)* sequence 
    ; 

//sequence 
// : ((command)=> command)* expression 
// ; 

sequence 
    : expression+ 
    ; 

datum 
    : simpleDatum 
    | compoundDatum 
    ; 

simpleDatum 
    : bool 
    | number 
    | CHARACTER 
    | STRING 
    | identifier 
    ; 

compoundDatum 
    : list 
    | vector 
    ; 

list 
    : '(' (datum+ ('.' datum)?)? ')' 
    | abbreviation 
    ; 

abbreviation 
    : abbrevPrefix datum 
    ; 

abbrevPrefix 
    : '\'' | '`' | ',@' | ',' 
    ; 

vector 
    : '#(' datum* ')' 
    ; 

number 
    : NUM_2 
    | NUM_8 
    | NUM_10 
    | NUM_16 
    ; 

bool 
    : TRUE 
    | FALSE 
    ; 

quasiquotation 
    : quasiquotationD[1] 
    ; 

quasiquotationD[int d] 
    : '`' qqTemplate[d] 
    | '(' QUASIQUOTE qqTemplate[d] ')' 
    ; 

qqTemplate[int d] 
    : (expression)=> expression 
    | ('(' UNQUOTE)=> unquotation[d] 
    |     simpleDatum 
    |     vectorQQTemplate[d] 
    |     listQQTemplate[d] 
    ; 

vectorQQTemplate[int d] 
    : '#(' qqTemplateOrSplice[d]* ')' 
    ; 

listQQTemplate[int d] 
    :      '\'' qqTemplate[d] 
    | ('(' QUASIQUOTE)=> quasiquotationD[d+1] 
    |      '(' (qqTemplateOrSplice[d]+ ('.' qqTemplate[d])?)? ')' 
    ; 

unquotation[int d] 
    : ',' qqTemplate[d-1] 
    | '(' UNQUOTE qqTemplate[d-1] ')' 
    ; 

qqTemplateOrSplice[int d] 
    : ('(' UNQUOTE_SPLICING)=> splicingUnquotation[d] 
    |       qqTemplate[d] 
    ; 

splicingUnquotation[int d] 
    : ',@' qqTemplate[d-1] 
    | '(' UNQUOTE_SPLICING qqTemplate[d-1] ')' 
    ; 

// macro keywords 
LET_SYNTAX  : 'let-syntax'; 
LETREC_SYNTAX : 'letrec-syntax'; 
SYNTAX_RULES  : 'syntax-rules'; 
DEFINE_SYNTAX : 'define-syntax'; 

// syntactic keywords 
ELSE    : 'else'; 
ARROW   : '=>'; 
DEFINE   : 'define'; 
UNQUOTE_SPLICING : 'unquote-splicing'; 
UNQUOTE   : 'unquote'; 

// expression keywords 
QUOTE   : 'quote'; 
LAMBDA   : 'lambda'; 
IF    : 'if'; 
SET    : 'set!'; 
BEGIN   : 'begin'; 
COND    : 'cond'; 
AND    : 'and'; 
OR    : 'or'; 
CASE    : 'case'; 
LET    : 'let'; 
LETSTAR   : 'let*'; 
LETREC   : 'letrec'; 
DO    : 'do'; 
DELAY   : 'delay'; 
QUASIQUOTE  : 'quasiquote'; 

NUM_2 : PREFIX_2 COMPLEX_2; 
NUM_8 : PREFIX_8 COMPLEX_8; 
NUM_10 : PREFIX_10? COMPLEX_10; 
NUM_16 : PREFIX_16 COMPLEX_16; 

ELLIPSIS : '...'; 

VARIABLE 
    : INITIAL SUBSEQUENT* 
    | PECULIAR_IDENTIFIER 
    ; 

STRING : '"' STRING_ELEMENT* '"'; 

CHARACTER : '#\\' (~(' ' | '\n') | CHARACTER_NAME); 

TRUE : '#' ('t' | 'T'); 
FALSE : '#' ('f' | 'F'); 

// to ignore 
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; 
COMMENT : ';' ~('\r' | '\n')* {$channel=HIDDEN;}; 

// fragments 
fragment INITIAL : LETTER | SPECIAL_INITIAL; 
fragment LETTER : 'a'..'z' | 'A'..'Z'; 
fragment SPECIAL_INITIAL : '!' | '$' | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~'; 
fragment SUBSEQUENT : INITIAL | DIGIT | SPECIAL_SUBSEQUENT; 
fragment DIGIT : '0'..'9'; 
fragment SPECIAL_SUBSEQUENT : '.' | '+' | '-' | '@'; 
fragment PECULIAR_IDENTIFIER : '+' | '-'; 
fragment STRING_ELEMENT : ~('"' | '\\') | '\\' ('"' | '\\'); 
fragment CHARACTER_NAME : 'space' | 'newline'; 

fragment COMPLEX_2 
    : REAL_2 ('@' REAL_2)? 
    | REAL_2? SIGN UREAL_2? ('i' | 'I') 
    ; 

fragment COMPLEX_8 
    : REAL_8 ('@' REAL_8)? 
    | REAL_8? SIGN UREAL_8? ('i' | 'I') 
    ; 

fragment COMPLEX_10 
    : REAL_10 ('@' REAL_10)? 
    | REAL_10? SIGN UREAL_10? ('i' | 'I') 
    ; 

fragment COMPLEX_16 
    : REAL_16 ('@' REAL_16)? 
    | REAL_16? SIGN UREAL_16? ('i' | 'I') 
    ; 

fragment REAL_2 : SIGN? UREAL_2; 
fragment REAL_8 : SIGN? UREAL_8; 
fragment REAL_10 : SIGN? UREAL_10; 
fragment REAL_16 : SIGN? UREAL_16; 
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)?; 
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)?; 
fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10; 
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)?; 

fragment DECIMAL_10 
    : UINTEGER_10 SUFFIX 
    | '.' DIGIT+ '#'* SUFFIX? 
    | DIGIT+ '.' DIGIT* '#'* SUFFIX? 
    | DIGIT+ '#'+ '.' '#'* SUFFIX? 
    ; 

fragment UINTEGER_2 : DIGIT_2+ '#'*; 
fragment UINTEGER_8 : DIGIT_8+ '#'*; 
fragment UINTEGER_10 : DIGIT+ '#'*; 
fragment UINTEGER_16 : DIGIT_16+ '#'*; 
fragment PREFIX_2 : RADIX_2 EXACTNESS? | EXACTNESS RADIX_2; 
fragment PREFIX_8 : RADIX_8 EXACTNESS? | EXACTNESS RADIX_8; 
fragment PREFIX_10 : RADIX_10 EXACTNESS? | EXACTNESS RADIX_10; 
fragment PREFIX_16 : RADIX_16 EXACTNESS? | EXACTNESS RADIX_16; 
fragment SUFFIX : EXPONENT_MARKER SIGN? DIGIT+; 
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' | 'E' | 'S' | 'F' | 'D' | 'L'; 
fragment SIGN : '+' | '-'; 
fragment EXACTNESS : '#' ('i' | 'e' | 'I' | 'E'); 
fragment RADIX_2 : '#' ('b' | 'B'); 
fragment RADIX_8 : '#' ('o' | 'O'); 
fragment RADIX_10 : '#' ('d' | 'D'); 
fragment RADIX_16 : '#' ('x' | 'X'); 
fragment DIGIT_2 : '0' | '1'; 
fragment DIGIT_8 : '0'..'7'; 
fragment DIGIT_16 : DIGIT | 'a'..'f' | 'A'..'F'; 

可与下面的类测试:

import org.antlr.runtime.*; 

public class Main { 
    public static void main(String[] args) throws Exception { 
    String source = 
     "(define sum-iter      \n" + 
     " (lambda(n acc i)      \n" + 
     " (if (> i n)       \n" + 
     "  acc        \n" + 
     "  (sum-iter n (+ acc i) (+ i 1))))) "; 
    R5RSLexer lexer = new R5RSLexer(new ANTLRStringStream(source)); 
    R5RSParser parser = new R5RSParser(new CommonTokenStream(lexer)); 
    parser.parse(); 
    } 
} 

,并生成一个词法分析器,编译所有Java源文件并运行主类,执行:

[email protected]:~/Programming/ANTLR/Demos/R5RS$ java -cp antlr-3.3.jar org.antlr.Tool R5RS.g 
[email protected]:~/Programming/ANTLR/Demos/R5RS$ javac -cp antlr-3.3.jar *.java 
[email protected]:~/Programming/ANTLR/Demos/R5RS$ java -cp .:antlr-3.3.jar Main 
[email protected]:~/Programming/ANTLR/Demos/R5RS$

正在打印控制台上没有任何的事实意味着解析器(和词法)没有发现与所提供的源的任何错误。

请注意,我没有单元测试,只测试了Main类中的单个Scheme源代码。如果你在ANTLR语法中发现错误,我会很高兴听到它们,所以我可以修复语法。在适当的时候,我可能会将语法提交给官方ANTLR Wiki

+0

非常感谢!这帮助我很多! – Sebastian 2011-06-15 09:49:09

+0

@Sebastian,请注意,我在'sequence'规则中遇到了一个小错误,我修正了这个错误(参见编辑答案)。当然,欢迎您。 – 2011-06-15 12:26:26

+0

@巴特,我现在根据你的语法修正了我的语法,而且我在运行测试类时没有问题。然而,当我使用ANTLRWorks或ANTLRIDE的内置解释器时,我得到一个FailedPredicateException。这有什么特别的理由吗? – Sebastian 2011-06-15 14:29:50