Expression parsing

It’s a classic problem and there are many ways to do it. The problem is, it is quite difficult for us laymen who doesn’t think like state machines. The problem lies in our education itself. We wasn’t taught the most effective notation system, and nobody wanted to fix it. Our current notation system was originally intended for simple calculation: two operand with a operator in between, that’s it! Merchants can purchase, early banks can operate, heads can be counted…

It’s not until mathematics have developed into a science beyond (most human’s) comprehension that someone realized there’s a more efficient way to write 3 * 4 + 2 * (4 + 1) ^ (3 / 2) without remembering operator precedence and remove the need of cumbersome parentheses.

Well, we’ve got to live with it and luckily, we have versatile computing devices that can make us coffee, the only problem left is to make them understand the mess we created…

Before we dig in…

We need something capable of splitting the expression into chewable parts, namely operators, operands and sign. It will iteratively navigate through the expression, returning one element at a time. Separating this from the rest of the parsing will make you a lot more healthy!

ExpressionTokenizer.java

public class ExpressionTokenizer
{

   public ExpressionTokenizer(String anInput)
   {
      input = anInput.toLowerCase();
      start = 0;
      end = 0;
      nextToken();
   }

   public boolean hasMoreToken() {
       return !(start >= input.length());
   }

   public String peekToken()
   {
      if (start >= input.length()) return null;
      else return input.substring(start, end).trim();
   }

   public String nextToken()
   {
      String r = peekToken();
      start = end;
      if (start >= input.length()) return r;
      if (Character.isDigit(input.charAt(start)) || input.charAt(start) == '.')
      {
         end = start + 1;
         while (end < input.length()
               && (Character.isDigit(input.charAt(end)) || input.charAt(start) == '.'))
            end++;
      }
      else if(input.substring(start).startsWith("cos")) {
          end = start + 3;
      } else if(input.substring(start).startsWith("sin")) {
          end = start + 3;
      } else if(input.substring(start).startsWith("sqrt")) {
          end = start + 4;
      } else
         end = start + 1;
      return r.trim();
   }

   private String input;
   private int start;
   private int end;
}
  • Constructor: give it the expression
  • peekToken: previews the next token
  • nextToken: returns the next token and move the cursor to the next one
  • hasNextToken: true if there are still more unprocessed tokens in the string

In short, it works like an iterator.

Method 1: Recursive parsing

This method, as the name implies, uses the power of recursion. The parsers is divided into several function to process different levels of operator precedence, for example:

Call stack

Each function does the following:

  • Call the higher level function to calculate the expression, this will ensure that operators with higher precedence get processed first, with the exception of the final function, which parses number
  • Check if the operator is at the function’s level, if it is, calculate the operator by getting the next value (by calling the higher function) and add it to the value returned form the first call, loop until the next operator does not belong to the function; otherwise ignore the operator and return calculation result (returned from the first call)

In my implementation, the least important function is named getExpressionValue() (for + and -), and consecutive ones are getTermValue() (for * and -), getHigherTermValue() (for ^), getFactorValue() (for cos, sin, parentheses and numbers)

Evaluator.java

public class Evaluator {

    public Evaluator(String anExpression) {
        tokenizer = new ExpressionTokenizer(anExpression);
    }

    public double getExpressionValue() {
        double value = getTermValue();
        boolean done = false;
        while (!done) {
            String next = tokenizer.peekToken();
            if ("+".equals(next) || "-".equals(next)) {
                tokenizer.nextToken();
                double value2 = getTermValue();
                if ("+".equals(next)) {
                    value = value + value2;
                } else {
                    value = value - value2;
                }
            } else {
                done = true;
            }
        }
        return value;
    }

    public double getTermValue() {
        double value = getHigherTermValue();
        boolean done = false;
        while (!done) {
            String next = tokenizer.peekToken();
            if ("*".equals(next) || "/".equals(next)) {
                tokenizer.nextToken();
                double value2 = getHigherTermValue();
                if ("*".equals(next)) {
                    value = value * value2;
                } else {
                    value = value / value2;
                }
            } else {
                done = true;
            }
        }
        return value;
    }

    public double getHigherTermValue() {
        double value = getFactorValue();
        boolean done = false;
        while (!done) {
            String next = tokenizer.peekToken();
            if ("^".equals(next)) {
                tokenizer.nextToken();
                double value2 = getFactorValue();
                if ("^".equals(next)) {
                    value = Math.pow(value, value2);
                }
            } else {
                done = true;
            }
        }
        return value;
    }

    public double getFactorValue() {
        double value;
        String next = tokenizer.peekToken();
        if ("cos".equals(next)) {
            tokenizer.nextToken();
            tokenizer.nextToken();
            value = Math.cos(getExpressionValue());
            tokenizer.nextToken();
        } else if ("sin".equals(next)) {
            tokenizer.nextToken();
            tokenizer.nextToken();
            value = Math.sin(getExpressionValue());
            tokenizer.nextToken();
        } else if ("sqrt".equals(next)) {
            tokenizer.nextToken();
            tokenizer.nextToken();
            value = Math.sqrt(getExpressionValue());
            tokenizer.nextToken();
        } else if ("(".equals(next)) {
            tokenizer.nextToken();
            value = getExpressionValue();
            tokenizer.nextToken();
        } else {
            value = Double.parseDouble(tokenizer.nextToken());
        }
        return value;
    }
    private ExpressionTokenizer tokenizer;
}

Finally, the test:

EvaluatorTester.java

import java.util.Scanner;

public class EvaluatorTester {

    public static void main(String[] args) {
        Evaluator e;
        double value;
        String input;

        input = "3+sqrt(9)*2^3";
        e = new Evaluator(input);
        value = e.getExpressionValue();
        System.out.println(input + "=" + value);

        input = "cos(3)-sqrt(9)*2^sin(4)+100";
        e = new Evaluator(input);
        value = e.getExpressionValue();
        System.out.println(input + "=" + value);

        Scanner in = new Scanner(System.in);
        System.out.print("Enter an expression: ");
        input = in.nextLine();
        if (input.length() > 0) {
            e = new Evaluator(input);
            value = e.getExpressionValue();
            System.out.println(input + "=" + value);
        }
    }
}

Result

Method 1 result

Method 2: Building a stack

You might have noticed that all these functions looked a lot alike, is there any way to simplify it? Well, when we used recursion, the runtime environment created us a stack, this stack keep track of which function should return value to which function. Recursion incurs a lot of overhead because of all the initialization, so if we change the purpose of the stack to store only operators, instead of calling a higher function, we store the operator in the stack, waiting until we meet a lower precedence operator.

We don’t have to use recursion, and it is faster, though you won’t notice the difference if the expression is short enough.

Evaluator.java

import java.util.Stack;

public class Evaluator {

    /*
     * Initializes the parser with an expression
     */
    public Evaluator(String anExpression) {
        tokenizer = new ExpressionTokenizer(anExpression);
    }

    /**
     * Calculates the operator's precedence for stack pushing, taking into account the current
     * parenthesis level of the operator
     * @param token
     * @param parenthesisLevel
     * @return an integer, more important operators are guaranteed to return a higher value
     */
    int getTokenPrecedence(String token, int parenthesisLevel) {
        final String operators[][] = {{"+", "-"}, {"*", "/"}, {"^"}, {"cos", "sin", "sqrt"}, {"(", ")"}};
        if (token == null) {
            return -1;
        }
        for (int i = 0; i < operators.length; i++) {
            for (int j = 0; j < operators[i].length; j++) {
                if (token.equals(operators[i][j])) {
                    return i + parenthesisLevel * operators.length;
                }
            }
        }
        return -1;
    }

    /**
     * Test if a token is number
     * @param test number to test
     * @return true if it is a valid number, false otherwise
     */
    public boolean isNumber(String test) {
        try {
            Double.parseDouble(test);
        } catch (Exception e) {
            return false;
        }
        return true;
    }

    /**
     * Evaluates the expression
     * @return expression value
     */
    public double getExpressionValue() {
        Stack numbers = new Stack();
        Stack operators = new Stack();
        // Store operator's precedence, used to handle parentheses
        Stack operatorsPrecedence = new Stack();
        int level = 0;
        while(true) {
            String next = tokenizer.nextToken();
            if (next != null) {
                if (isNumber(next)) {
                    numbers.push(Double.parseDouble(next));
                    continue;
                }
				// Parentheses are a special case, it makes all operators inside it more important than
				// anything outside, we won't store them in the stack because it will become very complicated,
				// so we only store the operator's precedence (affected by parenthesis level)
                if (next.equals("(")) {
                    level++;
                    continue;
                } else if (next.equals(")")) {
                    level--;
                    continue;
                }
            }
			// While we still have operators in the stack and the one in the stack is more important than the next one...
            while (!operators.empty() && getTokenPrecedence(next, level) <= operatorsPrecedence.peek()) {
				// We get the operator and its priority
                String operator = operators.pop();
                int p = operatorsPrecedence.pop();
				// One parameter operations. Why 5? because there are 5 level of operator
				// precedence in operators[][] in getTokenPrecedence. The fourth operations
				// have only one operand
                if (p % 5 == 3) {
                    Double value = numbers.pop();

                    if (operator.equals("cos")) {
                        numbers.push(Math.cos(value));
                    } else if (operator.equals("sin")) {
                        numbers.push(Math.sin(value));
                    } else if (operator.equals("sqrt")) {
                        numbers.push(Math.sqrt(value));
                    }
                } else {

					// We have only one number left
                    if (numbers.size() < 2) {
                        return numbers.pop();
                    }

					// Because stack is LIFO, the first number is the second operand
                    Double value2 = numbers.pop();
                    Double value1 = numbers.pop();

                    if (operator.equals("+")) {
                        numbers.push(value1 + value2);
                    } else if (operator.equals("-")) {
                        numbers.push(value1 - value2);
                    } else if (operator.equals("*")) {
                        numbers.push(value1 * value2);
                    } else if (operator.equals("/")) {
                        numbers.push(value1 / value2);
                    } else if (operator.equals("^")) {
                        numbers.push(Math.pow(value1, value2));
                    }
                }
            }
			// The operator in the stack is now less important, put the next one into the stack
            operators.push(next);
            operatorsPrecedence.push(getTokenPrecedence(next, level));
        }
    }

    private ExpressionTokenizer tokenizer;
}

Result

Result of method 2

Method 3: Build an expression tree

This is useful if you want the user to be able to edit the expression or process a certain part of it, the previous expression will look like this

Expression tree

You still iterate though the expression. When you encounter a number, you create a number node and attach it to the current operator, if you encounter a same level operator, push it to the right of the current operator, if the operator is more important, push it down to the right too, if it is less important, put it as the parent of current operator. Some operator like sin and cos will have only one child, and it will be on the right side.

We need some sort of data structure to store the node, I call it arithmetic node. It stores a token and whether that token is a number or an operator and its precedence if it’s an operator.

ArithmeticNode.java

/**
 * Nodes of a binary tree, with backtrack capabilities
 * Arithmetic node contains either operator (+-/*) or operands (numbers)
 */
public class ArithmeticNode {

    public ArithmeticNode(String _data, int _type, int _level) {
        data = _data;
        type = _type;
        level = _level;
    }

    /**
     * Inserts a new node as a descendant of this node.
     * @param onLeft false to insert to the right, else to the left
     * @param newNode node to insert
     */
    public void addChildNode(boolean onLeft, ArithmeticNode newNode) {
        if (onLeft) {
            left = newNode;
            left.parent = this;
        } else {
            right = newNode;
            right.parent = this;
        }
    }

    /**
    Prints this node and all of its descendants
    in sorted order.
     */
    public String printNodes() {
        String result = "";
        if (left != null) {
            result += left.printNodes();
        }
        result += (data);
        if (right != null) {
            result += right.printNodes();
        }
        return result;
    }

    /**
     * Calculates the result of this node and all child nodes
     * @return
     * @throws Exception if there are any error like wrong number format or
     * somehow the tree is wrongly structured
     */
    public Double calculate() throws Exception {
        if (this.type == OPERAND) {
            return Double.parseDouble(this.data);
        } else {
            if (this.left != null && this.right != null) {
                String operator = this.data;
                double value1 = 0, value2 = 0;
                if (this.left.type == OPERATOR)
                    value1 = this.left.calculate();
                else
                    value1 = Double.parseDouble(this.left.data.toString());
                if (this.right.type == OPERATOR)
                    value2 = this.right.calculate();
                else
                    value2 = Double.parseDouble(this.right.data.toString());
                if (operator.equals("+")) {
                    return (value1 + value2);
                } else if (operator.equals("-")) {
                    return (value1 - value2);
                } else if (operator.equals("*")) {
                    return (value1 * value2);
                } else if (operator.equals("/")) {
                    return (value1 / value2);
                } else if (operator.equals("^")) {
                    return (Math.pow(value1, value2));
                }

            }
            throw new Exception("Expression tree is malformed!");
        }
    }

    /**
     * Calculates the operator's precedence, taking into account the current
     * parenthesis level of the operator
     * @param token
     * @param parenthesisLevel
     * @return an integer, more important operators are guaranteed to return a higher value
     */
    public int getNodePrecedence() {
        final String operators[][] = {{"+", "-"}, {"*", "/"}, {"^"}};
        if (this.type != OPERATOR) {
            return -1;
        }
        String token = data.toString();
        for (int i = 0; i < operators.length; i++) {
            for (int j = 0; j < operators[i].length; j++) {
                if (token.equals(operators[i][j])) {
                    return i + level * operators.length;
                }
            }
        }
        return -1;
    }

    /**
     * Main data of this node, can be a string for operator or a double for
     * operand
     */
    public String data;
    /**
     * either OPERATOR or OPERAND, describe the node
     */
    public int type;
    /**
     * Parentheses level of this node, used for precedence evaluation
     */
    public int level;
    public ArithmeticNode left;
    public ArithmeticNode right;
    public ArithmeticNode parent;
    public static final int OPERATOR = 0;
    public static final int OPERAND = 1;
}

When the tree is built, to get the expression value, you simply call the calculate() function on the root node. The calculate() function uses in-order traversal to loop trough all nodes and return their value if it’s a number, combine and calculate if it’s an operator node.

Evaluator.java

public class Evaluator {

    /*
     * Initializes the parser
     */
    public Evaluator(String anExpression) {
        tokenizer = new ExpressionTokenizer(anExpression);
    }

    /**
     * Test if a token is number
     * @param test
     * @return
     */
    public boolean isNumber(String test) {
        try {
            Double.parseDouble(test);
        } catch (Exception e) {
            return false;
        }
        return true;
    }

    /**
     * Evaluates the expression, this method builds a binary tree with nodes
     * representing tokens in the expression
     * @return result of the expression
     */
    public double getExpressionValue() {
        ArithmeticNode lastOperand = null, lastOperator = null, root = null;
        int level = 0;
        boolean nextIsLeft = true;
        while (tokenizer.hasMoreToken()) {
            String token = tokenizer.nextToken();
            if (token.equals("(")) {
                level++;
                continue;
            } else if (token.equals(")")) {
                level--;
                continue;
            }
            ArithmeticNode aNode;
            if (isNumber(token)) {
                aNode = new ArithmeticNode(token, ArithmeticNode.OPERAND, level);

                if (lastOperator != null && !nextIsLeft) {
                    lastOperator.addChildNode(nextIsLeft, aNode);
                }
                lastOperand = aNode;
                nextIsLeft = !nextIsLeft;
            } else {
                aNode = new ArithmeticNode(token, ArithmeticNode.OPERATOR, level);
                if (root == null) {
                    aNode.addChildNode(true, lastOperand);
                    root = aNode;
                } else if (aNode.getNodePrecedence() < lastOperator.getNodePrecedence()) {
                    ArithmeticNode nextParent = lastOperator;
                    // Backtrack until there is a compatible node (a node that
                    // have precedence smaller than the current, or root
                    while (nextParent.parent != null && aNode.getNodePrecedence()
                            < nextParent.getNodePrecedence())
                        nextParent = nextParent.parent;
                    aNode.addChildNode(true, nextParent);
                    root = aNode;
                } else {
                    // Swap the new node with the previous operand
                    aNode.addChildNode(true, lastOperator.right);
                    lastOperator.addChildNode(false, aNode);
                }
                lastOperator = aNode;
                // We always need a number or something after a operator
                nextIsLeft = false;
            }

        }
        double result = 0;
        try {
            result = root.calculate();
        } catch (Exception e) {
            System.out.println("Some error happened: " + e.getMessage());
        }
        return result;
    }    

    private ExpressionTokenizer tokenizer;
}

With this tree, you can re-export the expression in Polish notation easily

Leave a Reply

Your email address will not be published. Required fields are marked *