在Python中编写Sebesda的词法分析器。对于输入文件中的最后一个词位不起作用

问题描述:

我必须将Sebesda的编程语言编程语言(第4章,第2节)中的代码翻译成python。这是我到目前为止有:在Python中编写Sebesda的词法分析器。对于输入文件中的最后一个词位不起作用

# Character classes # 
LETTER = 0 
DIGIT = 1 
UNKNOWN = 99 

# Token Codes # 
INT_LIT = 10 
IDENT = 11 
ASSIGN_OP = 20 
ADD_OP= 21 
SUB_OP = 22 
MULT_OP = 23 
DIV_OP = 24 
LEFT_PAREN = 25 
RIGHT_PAREN = 26 

charClass = '' 
lexeme = '' 
lexLen = 0 
token = '' 
nextToken = '' 

### lookup - function to lookup operators and parentheses ### 
###   and return the token       ### 
def lookup(ch): 
    def left_paren(): 
     addChar() 
     globals()['nextToken'] = LEFT_PAREN 

    def right_paren(): 
     addChar() 
     globals()['nextToken'] = RIGHT_PAREN 

    def add(): 
     addChar() 
     globals()['nextToken'] = ADD_OP 

    def subtract(): 
     addChar() 
     globals()['nextToken'] = SUB_OP 

    def multiply(): 
     addChar() 
     globals()['nextToken'] = MULT_OP 

    def divide(): 
     addChar() 
     globals()['nextToken'] = DIV_OP 
    options = {')': right_paren, '(': left_paren, '+': add, 
       '-': subtract, '*': multiply , '/': divide} 

    if ch in options.keys(): 
     options[ch]() 
    else: 
     addChar() 

### addchar- a function to add next char to lexeme ### 
def addChar(): 
    #lexeme = globals()['lexeme'] 
    if(len(globals()['lexeme']) <=98): 
     globals()['lexeme'] += nextChar 
    else: 
     print("Error. Lexeme is too long") 

### getChar- a function to get the next Character of input and determine its character class ### 
def getChar(): 
    globals()['nextChar'] = globals()['contents'][0] 
    if nextChar.isalpha(): 
     globals()['charClass'] = LETTER 
    elif nextChar.isdigit(): 
     globals()['charClass'] = DIGIT 
    else: 
     globals()['charClass'] = UNKNOWN 
    globals()['contents'] = globals()['contents'][1:] 


## getNonBlank() - function to call getChar() until it returns a non whitespace character ## 
def getNonBlank(): 
    while nextChar.isspace(): 
     getChar() 

## lex- simple lexical analyzer for arithmetic functions ## 
def lex(): 
    globals()['lexLen'] = 0 
    getNonBlank() 
    def letterfunc(): 
     globals()['lexeme'] = '' 
     addChar() 
     getChar() 
     while(globals()['charClass'] == LETTER or globals()['charClass'] == DIGIT): 
      addChar() 
      getChar() 
     globals()['nextToken'] = IDENT 

    def digitfunc(): 
     globals()['lexeme'] = '' 
     addChar() 
     getChar() 
     while(globals()['charClass'] == DIGIT): 
      addChar() 
      getChar() 
     globals()['nextToken'] = INT_LIT 

    def unknownfunc(): 
     globals()['lexeme'] = '' 
     lookup(nextChar) 
     getChar() 

    lexDict = {LETTER: letterfunc, DIGIT: digitfunc, UNKNOWN: unknownfunc} 
    if charClass in lexDict.keys(): 
     lexDict[charClass]() 
    print('The next token is: '+ str(globals()['nextToken']) + ' The next lexeme is: ' + globals()['lexeme']) 

with open('input.txt') as input: 
    contents = input.read() 
    getChar() 
    lex() 
    while contents: 
     lex() 

我使用的字符串作为sum + 1/33我的样本输入字符串。据我所知,第一次调用getChar()时,*设置characterClass为LETTER,contentsum + 1/33

然后程序进入while循环并调用lex()lex()依次累积字数为lexeme。当内部letterfunc while循环遇到的第一个空格字符,它打破了,退出lex()

由于contents不为空,方案经过在顶层while循环一次。这一次,getNonBlank()调用内部lex()“在contents抛出的空间和相同的处理,重复之前。

当我遇到一个错误,就是在最后的语义。有人告诉我globals()['contents'][0]超出范围当被getChar()调用时,我并不期待它是一个难以找到的错误,但我已经试过用手来追踪它,并且似乎无法找到问题所在,任何帮助都将不胜感激,

它是只是因为在成功读取输入字符串的最后一个3之后,digitfunc函数重复了一次getchar。但是在那一刻content已经耗尽并且是空的,所以contents[0]传递缓冲区结束,因此出错。

作为一种解决方法,如果在表达式的最后一个字符后添加换行符或空格,则当前代码不会出现问题。

其原因是,当最后一个字符是UNKNOWN您立即返回法和退出循环,因为content是空的,但如果你正在处理数字或符号你循环调用getchar没有测试输入端。顺便说一句,如果您的输入字符串以右paren结尾,那么您的词法分析器会吃掉它,并忘记显示它找到它。

所以,你应该至少为:在getchar函数输入

  • 测试结束:

    def getchar(): 
        if len(contents) == 0: 
         # print "END OF INPUT DETECTED" 
         globals()['charClass'] = UNKNOWN 
         globals()['nextChar'] = '' 
         return 
        ... 
    
  • 显示,如果一个最后的令牌左:

    ... 
    while contents: 
        lex() 
    lex() 
    
  • 控制如果存在词位(在输入结束时可能发生奇怪的事情)

    ... 
    if charClass in lexDict.keys(): 
        lexDict[charClass]() 
    if lexeme != '': 
        print('The next token is: '+ str(globals()['nextToken']) + 
          ' The next lexeme is: >' + globals()['lexeme'] + '<') 
    

但是你的全局的用法是。常见的成语使用全局从内部功能是使用前声明它:

a = 5 

def setA(val): 
    global a 
    a = val # sets the global variable a 

但是在Python全局是码味。你能做的最好的事情就是把你的解析器正确地封装在一个类中。 对象比全局变量好