Weekend project: Lexical analysis & expression trees
Last Friday I developed an itch. After watching a range of YouTube videos about programming languages I was reminded of the fact that I’ve never written a lexer. I’ve known the basic theory of lexers for years but I’ve never actually written one, despite coming across challenges in my work that could benefit from even the simplest lexical analysis. I decided to write a library that I could use in Cuterm to evaluate simple math expressions as well as in Novelty for conditional expressions.
So what is a lexer? Given an arbitrary string of characters a lexer produces a string of tokens:
This is done for practical reasons. In code it’s much easier to reason about a string of tokens than a strings of characters. Characters are all the same while tokens carry some inherit meaning. A simple example would be to discern which parts of the plain text are numbers and which are not.
Once parsed you can scan the tokens and make adjustments. I added support for implicit multiplication notation by looking for numbers directly followed by an open parenthesis:
But a working lexer is only half of the work. To make sense of the token stream you would normally use it to build an expression tree with operators and operands. My implementation is limited to mathematical expressions, but typically you would use it to parse code too.
Since I wanted a nice generic lexer that I can use in multiple projects I went the extra mile and added support for common programming operators as well as proper string tokenization for both single and double quotation marked strings and escape characters. Although not all tokens can be used in the tree, the support is there for future needs and it made the function feel more robust.
I added a callback for when the evaluation algorithm encountered an unknown identifier like a variable x, so I could hook it up to Novelty’s game variables. Then it occurred to me that I could derive an unknown variable by generating two trees on each side of an equality operator and then shuffle nodes to the other side until only x remains on one.
So given an expression 2(x-3) = 6 I could derive x = (6/2)+3, x = 6.
But once I had reached this point I decided it would be better to get something to eat rather than continue my little geek-out.