Little stacker

After no blood, a little sweat, and plenty of tears, I was able to write a tiny synthesizer that does its job (make sound), but I'm always looking for ways to improve things. The program currently generates noise with one carrier:modulator pair per voice, which is more versatile that you might expect, but certainly limited at the end of the day. I thought it'd be nice if users could express synthesis formulas as arbitrary s-expressions, like

(+ (* osc1 0.5) (* osc2 0.5))

where osc1 and osc2 are table lookups for oscillators set to specific frequencies and waveforms. There'd be variables for envelopes as well, but there wouldn't need to be too many actual features, since the user-defined arithmetic would be doing most of the work.

Of course, interpreting a Lisp expression 48,000 times a second wouldn't be very performant—at least if handled naively. I would have to compile it first, winnowing recursive lists and branching logic down into a flat array of bytes.

What follows are field notes on compiling s-expressions into stacks and performing computations with them. These paragraphs are still a far cry from an actual compiled-to-assembly implementation that can handle audio, as everything is written for a theoretical Scheme architecture, but they document a good first step in the right direction.

I wrote this in May 2019 on a netbook that still runs Chicken Scheme 4. This is one of the few articles I managed to recover after accidentally nuking my site. I'm re-posting it mainly for my own benefit; it is by no means an authoritative tutorial on compilers.

Stacks

An s-expression (+ 1 2) can be evaluated by building up a list of values representing local state, and applying functions to it one by one. This can be as simple as reading the expression backwards.

2
2 1
2 1 +
3

This expanding and shrinking list is called a stack. Longer expressions like (+ 1 2 3) mean that the variadic function + needs to be added to the stack twice.

3
3 2 
3 2 1
3 2 1 +
3 3
3 3 +
6

Generalized, a variadic function f in expression exp needs to be applied to its stack (- (length exp) 2) times.

Even recursive expressions are effortlessly flattened into an array using this backwards model.

; s-expression
(* 1 2 (+ 3 3))

; stack-based expression
3 3 + 2 1 * *

; evaluation
3
3 3
3 3 +
6
6 2
6 2 1
6 2 1 *
6 3
6 3 *
18

A mini language

It's possible to build a robust language from this model, especially for a domain as constrained as wave functions. It will be able to handle arithmetic, boolean logic, variable references, and branching. There is no need for more advanced features such as looping, but the primitive constructs provided would certainly allow for it.

Representation

The stack, variables, and expression can all be represented as Lisp lists, and the usual tools like (cons) (car) and (foldr) are used to manipulate them. The values within these lists are either numeric literals or symbols representing functions.

Numbers

The final version of this language will be written in C rather than Scheme, and won't benefit from the same level of expressiveness. The C stack evaluator won't see a linked list of numbers and symbols; everything will simply be a byte. It will be impossible to determine a function from a number without any kind of tagging system. In order to get acquainted with this constraint, the example language appends the n command before any numeric value, which signals the presence of a number to the interpreter. The stack

3 3 + 2 1 * *

is actually represented as

n 3 n 3 + n 2 n 1 * *

I thought it was important to reinforce the linear nature of evaluation by making n its own array value.

Variables

In a function with two parameters, such as (f x y), x and y can be thought of as a two item list. Any reference to either of these symbols in the body of the function can be replaced with the v command, which references either index 0 or 1 from the parameters list. v simply uses the first number on the stack as its argument, which means the correct index needs to be pushed ahead of time with n. The stack

x y +

should instead look like

n 0 v n 1 v +

If the above function were called with 1 and 2 as its parameters, then a list '(111 222) would be passed to the evaluator function

; environment
(111 222)

; stack-based expression
n 0 v n 1 v +

; evaluation
n
n 0
0
0 v
111
111 n
111 n 1
111 1
111 1 v
111 222
111 222 +
333

Logic

Something that is true is expressed as 1, and something that is false is expressed as 0. This keeps everything numerical. The following expressions should be self-explanatory.

; (= 1 1)
n 1 n 1 =
1

; (> 1 2)
n 2 n 1 >
0

; (not 1)
n 1 !
0

; (or 0 1)
n 1 n 0 ^
1

; (and 0 1)
n 1 n 0 &
0

Branching

Sometimes only certain parts of an expression should be evaluated, based upon user input. The sole branching mechanism in the language is the ternary operator ?, which executes one of exactly two expressions based upon the boolean value of the first number on the stack. Consider the conditional expression

(? (= x 3) (* x x x) (+ x x x))

If x is 3, then the function should cube it. If it's anything else, then x is added to itself 3 times. In order to facilitate this conditional flow, ternary operators compile slightly out of order. The first part of the expression appears on the stack as

n 3 n 0 v = ?

It is important that ? appears after the boolean expression, so that 1 or 0 can be its parameter in deciding which branch to evaluate.

And how are branches segregated in a linear array? They are delimited by goto commands in the form of the g symbol. The goto command takes the first number off the stack, and then skips over that many elements of the expression. Control flow is thus represented as

(boolean expression) ? (length branch1) g (branch1) (length branch2) g (branch 2)

If the argument to ? is true, then the evaluator will skip over the first g in the rest of the expression and evaluate the code that follows, until it reaches the second g, at which point it will jump over the remaining branch.

If the condition is false, the evaluator will immediately read the first g and jump straight to the second branch in the expression.

The full expanded form of the function is:

n 3 n 0 v = ? n 15 g n 0 v n 0 v n 0 v * * n 12 g n 0 v n 0 v n 0 v + +

Try to work out the stack shape for different values of x. You'll see how it makes sense.

Writing an interpreter

These primitive concepts form the entirety of the math language. A loop that simulates these expressions, along with a stack and variable environment, can be implemented in a few lines of Scheme.

Quasiquoting

The following code makes extensive use of quasiquoting. This is a useful shorthand for representing symbolic manipulation, so make sure you're familiar with it.

; value of x
5

; literal lists 
'(1 2 3 4 x x)
(1 2 3 4 x x)

; selectively-evaluated lists
`(1 2 3 4 x ,x)
 (1 2 3 4 x 5)

; spliced lists
`(1 2 3 4 ,@(x x))
 (1 2 3 4 5 5)

These are much more convenient than nested calls to (cons) and (cadddr) or whatever.

Function storage and evaluation

All functions are of the form

(f exp env stack)

where exp is the remaining expression being evaluated, env is the numeric values of variables referenced in the expression body, and stack is the current stack of the expression evaluation. All three of these parameters are lists. Every f returns a two-dimensional list

`(,exp ,env, ,stack)

which is used in the next stage of the evaluator loop.

Functions are stored in the global associative list FS, where a symbol found in the expression, such as n or =, is matched against the actual function that will be performed. FS will be specified after all the functions themselves are defined. A symbol x is looked up in the alist env with the (f-ref) procedure

(define (f-ref env x)
  (let ((f (assoc x env)))
    (if (not f)
      err 
      (cadr f))))

The env argument to (f-ref) is always FS. If a function is not found, the err procedure is returned instead:

(define (err exp env stack)
  (display "Invalid syntax: ")
  (display (car exp))
  (newline)
  '(()()(error)))

err stops the expression evaluation loop by returning a null exp list.

The full loop looks like

(define (evaluate exp env stack)
  (if (null? exp)
    (car stack)
    (let ((f (f-ref FS (car exp))))
      (apply evaluate (f exp env stack)))))

The expression

(+ 1 2 x y)

where x is 3 and y is 4 would be evaluated as

(evaluate '(n 1 n 2 n 0 v n 1 v + + +) '(3 4) '())
10

Primitive functions

The n command removes the next number of the expression and pushes it on to the stack. Observe how (cddr exp) is returned in order to reflect two expression symbols being read at once.

(define (push exp env stack)
  `(,(cddr exp) ,env ,(cons (cadr exp) stack)))

The v command consumes the first number from the stack and uses it to reference the environment. The value returned from this lookup in pushed to the stack.

(define (var exp env stack)
  `(,(cdr exp) ,env ,(cons (list-ref env (car stack)) (cdr stack))))

The g command jumps ahead by consuming the first number from the stack and then dropping that many elements from the remaining expression.

(define (jump exp env stack)
  `(,(drop exp (car stack)) ,env ,(cdr stack)))

Arithmetic functions

All math operators consume two numbers from the stack and push one back onto it. They don't perform any unusual operations upon the environment or expression. It's possible to generalize such a dyadic function (f exp env stack) with

(define (dyadic f exp env stack)
  `(,(cdr exp) ,env ,(cons (f (car stack) (cadr stack)) (cddr stack))))

All arithmetic is implemented on top of this.

(define (add exp env stack) (dyadic + exp env stack))

(define (subtract exp env stack) (dyadic - exp env stack))

(define (multiply exp env stack) (dyadic * exp env stack))

(define (divide exp env stack) (dyadic / exp env stack))

Boolean functions

Scheme needs to be able to translate boolean values to numbers and vice-versa.

(define (bool->number t)
  (if t 1 0))

(define (number->bool n)
  (if (= n 0) #f #t))

All comparisons are also dyadic in nature. They are generalized as

(define (compare f exp env stack)
  (dyadic (lambda (x y) (bool->number (f x y))) exp env stack))

Comparison operators are implemented as follows.

(define (equals exp env stack) (compare = exp env stack))

(define (greater-than exp env stack) (compare > exp env stack))

(define (greater-than-equal exp env stack) (compare >= exp env stack))

(define (less-than exp env stack) (compare < exp env stack))

(define (less-than-equal exp env stack) (compare <= exp env stack))

The ! command simply flips the boolean value of the first stack item.

(define (is-not exp env stack)
  `(,(cdr exp) ,env ,(cons (if (number->bool (car stack)) 0 1) (cdr stack))))

& and ^ are longer dyadic calls.

(define (both exp env stack)
  (dyadic (lambda (x y)
            (bool->number (and (number->bool x) (number->bool y))))
          exp env stack))

(define (either exp env stack)
  (dyadic (lambda (x y)
            (bool->number (or (number->bool x) (number->bool y))))
          exp env stack))

The ? operator is implemented by consuming the first number on the stack. If it is true, it will drop 4 items from the expression, skipping the first g statement. If it is false, it will walk the expression right into a jump.

(define (ternary exp env stack)
  `(,(if (number->bool (car stack)) (drop exp 4) (cdr exp)) ,env ,(cdr stack)))

The master list

The constant FS must be specified after the above function definitions. Compare its structure to (f-ref) and (evaluate) to see how functions are actually applied in the loop.

(define FS
  `((n ,push)
    (v ,var)
    (g ,jump)
    (+ ,add)
    (- ,subtract)
    (* ,multiply)
    (/ ,divide)
    (= ,equals)
    (> ,greater-than)
    (>= ,greater-than-equal)
    (< ,less-than)
    (<= ,less-than-equal)
    (! ,is-not)
    (& ,both)
    (^ ,either)
    (? ,ternary)))

Writing a compiler

The interpreter loop will evaluate any stack-based expression, but these aren't exactly human readable. A set of functions that compile s-expressions to stack form are required. The compiler works in two passes. The first phase tags numbers, translates variables into environment references, and delimits branches with goto statements—all while maintaining recursive s-expression structure. The second phase calculates the size of branches and reads the expression into a backwards list.

Variables

Variable references are arbitrary symbols that are stored in an alist

`(,var-name v ,env-index n)

where var-name serves as the lookup key during compilation. The list of variables and their environment indices is created with

(define (make-vars xs)
  (map (lambda (x n) `(,x v ,n n)) xs (iota (length xs))))

If a function is meant to have a list of three parameters (x y z), then any x, y, or z found during compilation should be replaced with a proper v command. The entries created by (make-vars) are those commands.

(make-vars '(x y z))
((x v 0 n)
 (y v 1 n)
 (z v 2 n))

Notice how (x v 0 n) is the actual variable reference form (n 0 v) backwards. This is because expressions are reversed during the second compiler pass.

The full environment that the compiler uses is FS appended to the user-specified variables.

(define (make-env xs)
  (append FS (make-vars xs)))

The compiler calls (env-ref) against this environment when it encounters a symbol. If the symbol is a valid function in FS, then the function symbol itself is returned. It will be interpreted into its actual procedure during runtime. If the symbol is a variable, then its (n 0 v) etc. equivalent will be returned. If symbol is not found, it will be marked with a ! to ensure it fails in the interpreter.

(define (env-ref xs x)
  (let ((v (assoc x xs)))
    (cond ((not v) `(,x !))
          ((eq? (cadr v) 'v) (cdr v))
          (else `(,(car v))))))

Predicates, list wrapping, and symbol evaluation

Notice how something like

(env-ref FS '+)

returns (+) rather than +. The compiler will sometimes return individual symbols as lists in order to ease along expression concatenation and branch length calculation. This practice makes sense once you consider how variadic operators are handled.

(define (variadic? x)
  (or (eq? x '+) (eq? x '-) (eq? x '/) (eq? x '*)))

(define (branch? xs)
  (and (list? xs) (eq? (car xs) '?)))

(define (make-symbol env x n)
  (cond ((number? x) `(,x n))
        ((variadic? x) (make-list n x))
        (else (env-ref env x))))

If a symbol represents a variadic function, then it needs to return a list of symbols depending upon the value of n, where n is (- (length exp) 2). In order to keep the higher-level procedures that parse with (make-symbol) consistent, all calls to (env-ref) return a list, even if it's only one item long.

A related function (atom-check) also ensures that symbols are wrapped in lists.

(define (atom-check x)
  (if (not (list? x)) (list x) x))

The first pass

Make sure you understand how (foldr) and (foldl) work before continuing. They're the lynchpin of the compiler.

The compiler relies on mutually recursive functions (make-exp) and (make-branch). The first function runs on normal sub-expressions, while the second one is reserved for branches. Branches themselves contain sub-expressions of course, hence the recursion.

(define (make-exp e xs)
  (if (branch? xs)
    (make-branch e xs)
    (let ((n (- (length xs) 2)))
      (foldr (lambda (x acc)
               (cond ((branch? x) `(,(make-branch e x) ,@acc))
                     ((list? x) `(,(make-exp e x) ,@acc))
                     (else `(,@(make-symbol e x n) ,@acc))))
             '()
             xs))))

(define (make-branch e xs)
  (let ((pred (atom-check (cadr xs)))
        (branch1 (atom-check (caddr xs)))
        (branch2 (atom-check (cadddr xs))))
    `(? ,(make-exp e pred)
        n _ g ,(make-exp e branch1)
        n _ g ,(make-exp e branch2))))

There seems like a lot to digest here, but it's all centered around a right fold that reads every symbol in the expression and calls the appropriate function to evaluate it. Read the quasi-quoting syntax carefully to see how the accumulator value acc takes on its form at every stage of the form. Also notice how (make-branch) uses _ as placeholder values for the jump lengths that will be calculated in the second pass. Since jumps themselves count towards the jump length of a higher-level branch, everything must be laid down in this pass so that it may be counted properly.

Check your understanding of the first pass by considering the partial compilation of the earlier branching example.

(make-exp (make-env '(x)) '(? (= x 3) (* x x x) (+ x x x)))
(? (= v 0 n 3 n) n _ g (* * v 0 n v 0 n v 0 n) n _ g (+ + v 0 n v 0 n v 0 n))

The symbols are changed, but the order and recursive structure haven't.

The second pass

The final compiler pass will use the appropriately named (make-final-exp) and (make-final-branch) functions in a similar manner to the first pass. It will also use the (jump-length) function to recursively calculate how many cells a g command should skip.

(define (jump-length xs)
  (apply +
    (map (lambda (x) (if (list? x) (jump-length x) 1)) xs)))

(define (make-final-exp xs)
  (if (branch? xs)
    (make-final-branch xs)
    (foldl (lambda (acc x)
             (cond ((branch? x) `(,@(make-final-branch x) ,@acc))
                   ((list? x) `(,@(make-final-exp x) ,@acc))
                   (else `(,x ,@acc))))
           '()
           xs)))

(define (make-final-branch xs)
  (let ((pred (cadr xs))
        (branch1 (list-ref xs 5))
        (branch2 (list-ref xs 9)))
    `(,@(make-final-exp pred) ?
       n ,(+ (jump-length branch1) 4) g ,@(make-final-exp branch1)
       n ,(+ (jump-length branch2) 1) g ,@(make-final-exp branch2))))

This process shouldn't be too hard to understand now that you've wrapped your head around the first pass. The second pass is a left fold, which reverses all expressions into proper stack form. It also makes handy use of ,@ to ensure that everything remains flat. Note how the (jump-length) calls in (make-final-branch) have 4 and 1 added to them. This ensures that the length includes the presence of subsequent calls to g in the ternary operator.

Compiling

A small command that takes environment e and expression xs handles the rest.

(define (compile e xs)
  (make-final-exp (make-exp (make-env e) xs)))

Check if it all works.

(evaluate (compile '(x) '(? (= x 3) (* x x x) (+ x x x))) '(3) '())
27

Just for the hell of it

Why not re-implement the (lambda) form so that it compiles an expression to a stack and then passes its surrounding environment to it as the parameters? You'll need a macro for that.

(define-syntax λ
  (syntax-rules ()
    ((_ (args ...) exp)
     (lambda (args ...) (evaluate (compile '(args ...) 'exp) `(,args ...) '())))))

It works how one would expect.

((λ (x) (? (= x 3) (* x x x) (+ x x x))) 3)
27

Conclusion

This exercise was very much a novelty, but based upon what I've read, it's not too far off from how evaluation is represented at the machine code level. I'm looking forward to implementing an actual interpreter in C and assembly, which I suspect won't be nearly as elegant.