Notes on Crafting Interpreters: Welcome

Table of Contents

Introduction

What exactly are dynamic arrays, hash tables and bytecodes?

Challenges

  1. 6 DSLs are used for the production of this book. What are they?

    There’s a Makefile. Markdown is definitely used, as are XML, Yaml, SCSS and CSS. I think hard-coded HTML makes an appearance too.

  2. TODO: Write hello world in Java.

  3. C pointers. Define doubly linked list of heap-allocated strings. Write functions to insert, find and delete items from it.

    Above my current abilities. Geeks For Geeks appear to have a decent explanation which I maybe can return to once I get to part 3 of the book.

A map of the territory

scanning = lexing = lexical analysis

Scanner = lexer takes characters and chunks them into tokens.

Parser takes tokens and builds grammar using tree structures called (abstract) syntax trees, ASTs or just trees. Informs of syntax errors.

Binding = resolution. Identifier. Scope. For what variable refers to. Leads to type error, if statically typed.

All this semantic insight has to go somehwere, either as attributes in the syntax tree, stored away in a symbol table or by transforming the tree completely.

So far all front end. Next: middle end.

Code stored in intermediate representation, where it can then be optimised by constant folding for example before then generating code = code gen.

Now the back end. What should the end code be? p-code = bytecode? What does runtime look like? Do you write loads of native compilers or a virtual machine?

Different implementations include single-pass compilers, tree-walk interpreters, source-to-source compilers = transcompilers = transpiler, just-in-time (JIT) compilation.

Challenges

  1. Look around an open source implementation of a language for the scanner and parser.

    The ones for CPython are easy enough to find.

  2. What reasons are there to not JIT?

    Start-up delay, memory overload, security vulnerabilities.

  3. Why do most LISP implementations also contain an interpreter?

    Culture. And maybe because it is simple to do?

The lox language

Dynamically typed. 2 methods for memory management: reference counting and (tracing) garbage collection (GC). Lox uses GC.

Data types: Booleans, doubles, strings, nil.

Expression statements, which end in ;, can be wrapped up in a block with {}.

Nystrom is particular on arguments vs parameters:

  • argument: the value passed to the function,

  • parameter: the variable holding the value of the argument.

Classes vs prototypes. In class-based languages: instances and classes. In prototype-based languages these are merged, there are only objects.

Inheritance done with <, which creates a derived class = subclass of the base class = superclass.

Challenges

  1. TODO: Write some sample lox programs and run them. Try to come up with edge cases.

  2. List open questions on language’s syntax and semantics.

    Escape characters for strings? Numbers other than doubles such as integers?

  3. What missing features are most annoying?

    Lack of proper data structures, e.g. lists, dicts and sets. Lack of comments.

Question for self

Does Robert Nystrom write about how we wrote the book?

Yes he does. Main take away is to write every day. Main two articles describing the process of writing are here: https://journal.stuffwithstuff.com/2014/04/22/zero-to-95688-how-i-wrote-game-programming-patterns/ and https://journal.stuffwithstuff.com/2020/04/05/crafting-crafting-interpreters/