0

I'm writing a programming language in Python, but I have a problem with the lexer function.

I'll leave you the code, which is fully functional:

import sys

inputError = """

(!) Error.
(!) Cannot get input.
(!) Please try again.

"""
lexerError = """

(!) Error.
(!) Lexer error. Cannot read input and separate tokens.
(!) Please try again.

"""

def __get_input():
    err = False
    ipt = None
    
    try:
        ipt = input("} ")
        err = False
        
    except:
        ipt = None
        err = True
        
    return err, ipt

def __lexer(ipt):
    err = False
    
    digits  = "1234567890"
    
    NUM     = "DIGIT"
    PLUS    = "PLUS"
    MINUS   = "MINUS"
    MULTP   = "MULTP"
    DIV     = "DIV"
    L_BRK   = "L_BRK"
    R_BRK   = "R_BRK"
    
    token = ""
    tokens = []
    ttypes = []
    
    try:
        for i in range(len(ipt)):
            char = ipt[i]
            
            if char==' ':
                tokens.append(token)
                token = ""
            
            else:
                token = token + ipt[i]
        
        tokens.append(token)
        
        for i in range(len(tokens)):
            token = tokens[i]

            for j in range(len(token)):
                if token[j] in digits:
                    ttypes[i] = DIGITS
                
                else:
                    ttypes[i] = ''
        
        err = False
        
    except:
        tokens = []
        ttypes = []
        
        err = True
        
    
    return err, tokens, ttypes

def __init():
    err, ipt = __get_input()
    
    if err==True:
        print(inputError)
        sys.exit()
    
    err, tokens, ttypes = __lexer(ipt)
    
    if err==True:
        print(lexerError)
        sys.exit()
    
    print(err, tokens, ttypes)
    
__init()

My problem is at lines 66-67. I was expecting the lexer function (the rest works perfectly) to read the input from the user, separate it in tokens- and this works - and so recognize the type of the token. I've defined all the types of tokens, but I wanted to start from the integer type. The for cycle at line 62 starts the routine to verify if the token contains only valid digits or not, and so assigns the corresponding type to the current element of the array ttypes. So at line 67, using GDB online debugger, the program jumps directly to the except, and I don't know why. Things got more complicated to understand to me when I tried to put some print() functions instead of the lines ttypes[i] = DIGITS and ttypes[i] = '', because it worked as expected: if the token was composed by a number it printed something, else it printed something else.

...
        for i in range(len(tokens)):
            token = tokens[i]

            for j in range(len(token)):
                if token[j] in digits:
                    print("It's an integer")
                
                else:
                    print("It isn't an integer.")
...

That's what I tried.

I hope you could understand my problem. Thank you very much!

2
  • 3
    "the program jumps directly to the except", so perhaps you could collect and print the exception instead of ignoring it. That way you'd know why this is happening. Please update your question with your findings. Commented Sep 13, 2024 at 11:52
  • 1
    And you see another good reason why long try block where anything can happen anywhere and bare excepts are not recommended.... Commented Sep 13, 2024 at 11:59

1 Answer 1

1

This issue is related to the handling of the ttypes list. You are attempting to assign values to ttypes[i], but ttypes is initialized as an empty list. The problem arises because Python lists don't allow direct assignment to indices that don't exist.

This is an example.

ttypes = []
ttypes[0] = NUM
print(ttypes) # This will throw an IndexError

So when you try to assign ttypes[i] = NUM(I assume you were trying to assign NUM, instead of DIGITS), it raises an IndexError, which is why the code jumps to the except block. Use the append() function instead.

Here's my suggestion.

for token in tokens:
    if all(char in digits for char in token):
        ttypes.append(NUM)
    else:
        ttypes.append("UNKNOWN") # You can add more logic

I hope this will help a little.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.