Special

Clearance Sale!

We've been publishing for over five years now and it's time to clear out our inventory of back issues, so we're slashing prices!

RBD Magazines

Check out this amazing clearance sale of all our past issues. Missing some issues? This is a great time to complete your RBD collection. Save up to 40% off the regular price of our printed back issue packages. These prices are only good until the end of the year May 2008 and supplies are limited, so place your order today.

Article Preview


Buy Now

Print:
PDF:

Object-Oriented Thinking

An RPN Interpreter

"Hello World" For Compilers

Issue: 5.6 (September/October 2007)
Author: Charles Yeomans
Article Description: No description available.
Article Length (in bytes): 15,516
Starting Page Number: 42
RBD Number: 5616
Resource File(s): None
Related Link(s): None
Known Limitations: None

Excerpt of article text...

Writing an RPN interpreter is a standard five-finger exercise for computer science students. We will see why in this and several subsequent columns. But, first, what is RPN? RPN stands for "reverse Polish notation", and is a syntax for writing mathematical expressions involving operators. The usual form of a mathematical expression is in the form operand - operator - operand; 2 + 3, for example. This is called "infix" notation. In postfix notation, an expression is written in the form operator - operator - operand. In postfix, 2 + 3 is written as 2 3 +. An example of posfix with which you may be familiar is the factorial operator -- N!. There is also prefix notation, in which the operator comes first: + 2 3. This is also called "Polish notation", in honor of the mathematician Jan ?ukasiewicz. Prefix notation came before postfix, so postfix is known as reverse Polish notation, or RPN. It turns out that RPN has a number of advantages over infix notation. Consider the expression 2 + 3*5 . If you have mastered the elementary school mathematics curriculum, then you know that multiplication comes before addition, and so the expression evaluates to 17. Such rules are a problem for compilers and interpreters. Now, you can make the order of evaluation explicit using parentheses: 2 + (3*5). And indeed this is the recommended practice in some programming languages. As Steve Oualine writes in his book "Practical C Programming", "...there are fifteen precedence rules in C (&& comes before || comes before ?:). The practical programmer reduces these to two: "Multiplication and division come before addition and subtraction. Put parentheses around everything else." By contrast, postfix notation has no precedence rules at all. Consider the two infix expressions 2 + 3*5 and (2 + 3)*5. In postfix, they are 2 3 5 * + and 2 3 + 5 * . The expressions are evaluated from left to right, and you can do a subevaluation as soon as you have read two operands and an operator. Let us work through the evaluation of 2 3 5 * +. Step 1: Read 2 and store it somewhere. Step 2: Read 3 and store it somewhere. Step 3: Read 5 and store it somewhere. Step 4: Read *. Grab 5 and 3 from storage; compute 3*5; store 15. Step 5: Read +. Grab 15 and 2 from storage; compute 2 + 17; store 17. Step 6: We are at the end of the expression; the result is 17. And now let us work through the evaluation of 2 3 + 5 *. Step 1: Read 2 and store it somewhere. Step 2: Read 3 and store it somewhere. Step 3: Read +; grab 3 and 2 from storage; compute 2 + 3; store 5. Step 4: Read 5 and store it somewhere. Step 5: Read *. Grab 5 and 5 from storage; compute 5*5; store 25. Step 6: We are at the end of the expression; the result is 25. Operands (here, numbers) are stored in the order in which they are encountered. When we encounter an operator, we grab the two most recent numbers stored. For storage, then, we need a "last-in first-out" data structure, more commonly known as a stack. Specified abstractly, a stack is a data structure having two operations, push and pop. Push adds an element to the stack, and pop removes the last element added and returns it to the caller. For a concrete example, reach into your pocket and fish out some coins. Start stacking them in front of you, then take the top coin off the stack and send it to me. In REALbasic, the easiest way to implement a stack is to use an array. Array has a Pop method, added for just this purpose. Append already does the same thing as Push, and you simply ignore the rest of the Array interface. Now, let us implement an RPN interpreter bare-hands. We have already worked out the algorithm for interpretation above; translating it into code is a bit of work. Let us begin by considering the input string as a sequence of characters. Legal characters are either digits, operators, or whitespace. Any other characters are illegal. If we encounter an illegal character, we abort, which brings us to our first class, RPNInterpreterException. This is a subclass of RuntimeException that we use to represent errors in the course of interpreting an RPN expression. In the course of reading the input string, one of four things is happening. We are reading a number; we are reading an operator; we are reading whitespace; we are reading garbage. In other words, the read process is in one of four states. While in one of the first three states, we simply store the characters in a buffer. When the character type changes, we take some action and change to the new state. If we enter the error state, then we will raise an exception. Here is the core of our code. Function Interpret(rpnInput as String) As Integer dim character() as String = Split(input, "") const whitespaceSentinel = " " character.Append whitespaceSentinel dim operandStack() as Integer dim buffer() as String const stateWhitespace = 0 const stateOperand = 1 const stateOperator = 2 dim interpreterState as Integer = stateWhitespace for i as Integer = 0 to UBound(character) dim theCharacter as String = character(i) select case interpreterState case stateWhitespace case stateOperand case stateOperator else raise new RPNInterpreterException("Interpreter is in invalid state.") end select next End Function Because whitespace is ignored, we can pretend that the inputString is preceded by some amount of whitespace and thus start processing in stateWhitespace. And it turns out that appending a whitespace character to the input string simplifies processing; without it, we would need to write more code following the code to finish processing. So I have slipped this in. Now we fill in the Case blocks. First, we handle the case stateWhitespace. If theCharacter is whitespace, we simply continue to the next character. If theCharacter is a digit, we clear the buffer, append theCharacter, and switch to stateOperand. If theCharacter is an operator character, we clear the buffer, append theCharacter, and switch to stateOperator. If theCharacter is none of these, we raise an RPNInterpreterException. As we convert this to code, it is useful to write auxiliary methods for testing the type of a character. Private Function IsWhitespaceCharacter(s as String) As Boolean return (s = " ") End Function Private Function IsOperandCharacter(input as String) As Boolean return InStr("0123456789", Left(input, 1)) > 0 End Function Private Function IsOperatorCharacter(s as String) As Boolean return InStr("+-*/^mod", Left(input, 1)) > 0 End Function Now, here is the case block for stateWhitespace. case stateWhitespace if me.IsWhitespaceCharacter(theCharacter) then continue elseIf me.IsOperandCharacter(theCharacter) then redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperand elseIf me.IsOperatorCharacter(theCharacter) then redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperator else raise new RPNInterpreterException("Invalid character at offset " + Str(i) + ".") end if Next is the case stateOperand. In this state, if theCharacter is a digit, then we append it to the buffer, and remain in this state. If theCharacter is an operator character, we take the buffer contents, convert the string data to an Integer, and push it onto the operand stack. We then clear the buffer, append theCharacter, and change the state to stateOperator. If theCharacter is whitespace, we convert the buffer contents to an Integer, push it onto the operand stack, clear the buffer, and change the state to stateWhitespace. If theCharacter is none of these, we raise an RPNInterpreterException. case stateOperand if me.IsOperandCharacter(theCharacter) then buffer.Append theCharacter elseIf me.IsWhitespaceCharacter(theCharacter) then operandStack.Append Val(Join(buffer, "")) redim buffer(-1) interpreterState = stateWhitespace elseIf me.IsOperatorCharacter(theCharacter) then operandStack.Append Val(Join(buffer, "")) redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperator else raise new RPNInterpreterException("Invalid character at offset " + Str(i) + ".") end if And last is the case stateOperator. In this state, if theCharacter is an operator character, we append it to the buffer and continue. The remaining states are more interesting, because we actually evaluate an operator. Let us first write a method that does the evaluation. Private Function Evaluate(op1 as Integer, op2 as Integer, theOperator as String) As Integer select case theOperator case "+" return op1 + op2 case "-" return op1 - op2 case "*" return op1 * op2 case "/" return op1/op2 case "^" return op1^op2 case "mod" return op1 mod op2 else raise new RPNInterpreterException("Unrecognized operator " + theOperator + ".") end select End Function Now, if theCharacter is either whitespace or an operand character, we do the following. Pop the last two values off the operand stack. Pass these values and the operator, taken from the buffer, to Evaluate. Then push the return value onto the operand stack. At this point, you might want to review the evaluation of an RPN expression. Also, in the code that follows, observe that I take some care to preserve the order of operands; it matters for MOD. case stateOperator If me.IsOperatorCharacter(theCharacter) then buffer.Append theCharacter elseif me.IsOperandCharacter(theCharacter) then try dim op2 as Integer = operandStack.Pop dim op1 as Integer = operandStack.Pop dim theOperator as String = Join(buffer, "") operandStack.Append Evaluate(op1, op2, theOperator) redim buffer(-1) buffer.Append character(i) interpreterState = stateOperand catch oob as OutOfBoundsException raise new RPNInterpreterException("Missing operand at offset " + Str(i) + ".") end try elseIf me.IsWhitespaceCharacter(theCharacter) then try dim op2 as Integer = operandStack.Pop dim op1 as Integer = operandStack.Pop dim theOperator as String = Join(buffer, "") operandStack.Append Evaluate(op1, op2, theOperator) interpreterState = stateWhitespace catch oob as OutOfBoundsException raise new RPNInterpreterException("Missing operand at offset " + Str(i) + ".") end try Finally, we return the result of the interpretation of the RPN expression. If the expression was valid, then we should be left with a single value on the stack; that is the result of the evaluation. Let's consider what could go wrong. Had we encountered an illegal character, an RPNInterpreterException would have been raised. So if we finish the loop, then we know that the input string consists of only valid characters. If the input string consists of whitespace only, then the stack will be empty. If the input string contains any numbers at all, then they will be pushed onto the stack. If the input string contains an operator, then either it will push a value onto the stack or raise an exception because there are not enough operands on the stack. The following code, then, expresses the possible outcome of the evaluation of the RPN expression. if UBound(operandStack) = 0 then return operandStack(0) elseIf UBound(operandStack) > 0 then raise new RPNInterpreterException("Missing operator at offset " + Str(UBound(character) + 1) + ".") else return 0 end if Finally, here is the entire method. Function Interpret(rpnInput as String) As Integer dim character() as String = Split(rpnInput, "") const whitespaceSentinel = " " character.Append whitespaceSentinel dim operandStack() as Integer dim buffer() as String const stateWhitespace = 0 const stateOperand = 1 const stateOperator = 2 dim interpreterState as Integer = stateWhitespace for i as Integer = 0 to UBound(character) dim theCharacter as String = character(i) select case interpreterState case stateWhitespace if me.IsWhitespaceCharacter(theCharacter) then continue elseIf me.IsOperandCharacter(theCharacter) then redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperand elseIf me.IsOperatorCharacter(theCharacter) then redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperator else raise new RPNInterpreterException(i, "Illegal character") end if case stateOperand if me.IsOperandCharacter(theCharacter) then buffer.Append theCharacter elseIf me.IsWhitespaceCharacter(theCharacter) then operandStack.Append Val(Join(buffer, "")) redim buffer(-1) interpreterState = stateWhitespace elseIf me.IsOperatorCharacter(theCharacter) then operandStack.Append Val(Join(buffer, "")) redim buffer(-1) buffer.Append theCharacter interpreterState = stateOperator else raise new RPNInterpreterException(i, "Illegal character") end if case stateOperator if me.IsOperatorCharacter(theCharacter) then buffer.Append theCharacter elseif me.IsOperandCharacter(theCharacter) then try dim op2 as Integer = operandStack.Pop dim op1 as Integer = operandStack.Pop dim theOperator as String = Join(buffer, "") operandStack.Append Evaluate(op1, op2, theOperator) redim buffer(-1) buffer.Append character(i) interpreterState = stateOperand catch rpnError as RPNInterpreterException rpnError.Offset = i raise rpnError catch oob as OutOfBoundsException raise new RPNInterpreterException(i, "Missing operand") end try elseIf me.IsWhitespaceCharacter(theCharacter) then try dim op2 as Integer = operandStack.Pop dim op1 as Integer = operandStack.Pop dim theOperator as String = Join(buffer, "") operandStack.Append Evaluate(op1, op2, theOperator) interpreterState = stateWhitespace catch rpnError as RPNInterpreterException rpnError.Offset = i raise rpnError catch oob as OutOfBoundsException raise new RPNInterpreterException(i, "Missing operand") end try else raise new RPNInterpreterException(i, "Illegal character") end if else raise new RPNInterpreterException(0, "Interpreter is in invalid state.") end select next if UBound(operandStack) = 0 then return operandStack(0) elseIf UBound(operandStack) > 0 then raise new RPNInterpreterException(UBound(character) + 1, "Missing operator") else return 0 end if End Function In columns to come, we will rewrite this code and build on it to do some interesting things.

...End of Excerpt. Please purchase the magazine to read the full article.

Article copyrighted by REALbasic Developer magazine. All rights reserved.


 


|

 


Weblog Commenting and Trackback by HaloScan.com