Creating a scanner using parser combinators in Scala (PLDS part I)

This post is part of a series on Programming Language Design in Scala (PLDS). Click here to see the rest of the posts in the series.

In September 2014 I stumbled upon the Scala Parser Combinators library and ended up playing around with implementing a small programming language in Scala. Although the language itself was more or less useless, I thought that the process of designing and implementing it (and later extending it with more features) was a pretty fun activity. This then gave me the idea to start this blog as a place for programming and computer science related topics that I find interesting. My first blog post was supposed to be a short tutorial or introduction to Scala Parser Combinators based around the implementation of a small programming language. Because of a lack of motivation, ideas and time, my first post instead ended up being about an entirely different project, and my post about parser combinators remained an unfinished draft for more than six months.

Now I've finally found the energy to complete this project, or at least the first part of it. My idea is to write a series of blog posts about the design and implementation of programming languages using practical examples in Scala (and perhaps other languages in the future). The first two posts will be about the syntactic analysis part of a language implementation, i.e. parsing the source code. This very first post will briefly introduce language design, formal definition of syntax, and how to implement a scanner in Scala. A scanner (also known as a tokenizer or a lexer) is a simple program that reads a sequence of characters (the source code) and outputs a sequence of tokens, i.e. a sequence of syntactical components. This process is known as lexical analysis (and also sometimes tokenization), and is usually what precedes the actual parsing step, where a parse tree or an abstract syntax tree is constructed from the sequence of tokens. The parsing step is described in the next post, so for now we'll just be looking at the scanner.

This post should serve as an introduction to—and shouldn't require any prior knowledge of—language design and Scala, although a basic understanding of programming and programming languages is a prerequisite. I also recommend reading some tutorials if you're interested in learning more about Scala.

Continue reading…