Fixnum oscillator

Wavetable synthesis is a deceptively simple concept.

  1. Represent one cycle of a periodic waveform as an array W of length L containing sample values.
  2. Every sampling cycle, accumulate a phase value P with a note's pre-calculated frequency increment F.
  3. The value at W[P%L] yields the sample value S, which is written to the audio buffer.

In practice P and F are usually floats, which means that the phase won't directly track the indices of W. Floating-point linear interpolation reads the value "between" W[floor(P)%L] and W[floor(P+1)%L], weighted by the fractional part of P. The phase is also modulo'd against π each cycle to prevent pitch drift from floating point error. S itself tends to be a float too; this allows easy arithmetic in modulation operations and prevents integer overflow.

This arrangement is more than just theoretically simple. Floating point DSP procedures on modern architectures may actually be faster than their integer equivalents, since the fixnum registers are often clogged with other logic most of the time. It's doubtful that any meaningful performance boost can be gleaned from handling P, F, and S as integers, but as you might have guessed, I'm going to do it anyway. Why?

  1. The synthesizer might run on a different architecture one day, such as a hardware microcontroller. Not all of these platforms support floats.
  2. If S is not represented as a float, there's no reason to worry about dithering.
  3. FM synthesis is performed in just a few operations, meaning that any math speed up that could be gained from floats might be negligible. The conversions needed to read from W might make it faster to keep P as a pure int.
  4. Integer overflow can be exploited to simulate the cyclic nature of waves, avoiding modulo operations.
  5. It's fun.

The following code will be in plain old C. As much as I love FP langs, a topic like this calls for precise bit mangling. Please install sox if you'd like to play along.

Truncating oscillator

Paste the following code into your Vim. I'll dissect it later.

#include <limits.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define SIZE_TABLE 4096
#define WAVE_INC ((M_PI * 2.0) / SIZE_TABLE)
#define RATE 48000
#define FREQ_INC ((uint32_t)(UINT_MAX / (uint32_t)RATE))
#define SHIFT 20

int main() {
  uint16_t n = 0;
  double wavePhase = 0.0;
  int16_t w[SIZE_TABLE] = {0};
  uint32_t phase = 0;
  uint32_t f = 440 * FREQ_INC;
  int16_t s1 = 0;

  for (; n < SIZE_TABLE; n++) {
    w[n] = (int16_t)(SHRT_MAX * sin(wavePhase));
    wavePhase += WAVE_INC;
  }

  while (1) {
    phase += f;
    s1 = w[phase >> SHIFT];
    putchar(s1 & 255);
    putchar(s1 >> 8);
  } 

  return 0;
}

run

cc filename.c -lm && ./a.out | play -b 16 -e signed -c 1 -r 48000 -t raw -

and you should hear a sine wave ringing out a pleasant A.

Generating the wavetable

16 bit signed integers are enough to encode CD quality audio, which is why the wavetable W is represented by the following C code.

#define SIZE_TABLE 4096
#define WAVE_INC ((M_PI * 2.0) / SIZE_TABLE)

...

uint16_t n = 0;
double wavePhase = 0.0;
int16_t w[SIZE_TABLE] = {0};

...

for (; n < SIZE_TABLE ; n++) {
  w[n] = (int16_t)(SHRT_MAX * sin(wavePhase));
  wavePhase += WAVE_INC;
}

This code increments a floating point phase value wavePhase between 0 and 2π over the course of SIZE_TABLE samples. For each sample, the float return value of sin [-1.0, 1.0] is multiplied against SHRT_MAX and truncated to 16 bits. The array simulates a single cycle of a sine, but now scaled between [-32767, 32767] instead.

Tracking phase

The phase P of a signal is incremented by frequency F and modulo'd against TABLE_SIZE to read from the wavetable. This logic is represented in C as follows.

#define RATE 48000
#define FREQ_INC ((uint32_t)(UINT_MAX / (uint32_t)RATE))

...

uint32_t phase = 0;
uint32_t f = 440 * FREQ_INC;

...

phase += f;

RATE is the industry standard sampling rate of 48,000 Hz. When UINT_MAX is divided against it, it provides the basic unit of increment that will take place during each sampling period, assuming the signal's frequency is 1Hz. Since this is inaudible, the actual pitch increment f is 440 * FREQ_INC.

The bounds of the 32 bit unsigned integer phase are now easily mapped to the bounds of the wavetable w, which in turn is mapped to the cycles of a waveform. phase += f overflows by design, and every time it does, it signals a new cycle of the sine. This avoids the need for any explicit modulo.

The size of phase and f is significant. I chose 32 bit ints over 16 bit ones for reasons that will become apparent later.

Reading/writing samples

Retrieving the sample value at w means truncating phase down to a number between [0, 4095], so that it can address a specific index of the wavetable. Since TABLE_SIZE is a power of 2, this is not a problem.

#define SHIFT 20

...

int16_t s1 = 0;

...

s1 = w[phase >> SHIFT];
putchar(s1 & 255);
putchar(s1 >> 8);

Right shifting the 32 bit value phase by 20 places turns it into a 12 bit one—less than 4096: the exact index of the desired sample s1. The toy oscillator writes audio to sox by breaking s1 into two separate bytes and writing them to stdout, where the program is listening.

This specific sine sounds fine via this truncation method, but it's not advised for wavetable synthesis in general. That's where linear interpolation comes in.

Interpolated oscillator

The accuracy of a signal is improved considerably with linear interpolation. This involves sampling the wavetable at two adjacent points, and using them to derive an entirely new sample. The oscillator code is almost identical.

#include <limits.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>

#define SIZE_TABLE 4096
#define WAVE_INC ((M_PI * 2.0) / SIZE_TABLE)
#define RATE 48000
#define FREQ_INC ((uint32_t)(UINT_MAX / (uint32_t)RATE))
#define SHIFT 20
#define TABLE_INC (1 << SHIFT)
#define FRAC_MASK (TABLE_INC - 1)

int main() {
  uint16_t n = 0;
  double wavePhase = 0.0;
  int16_t w[SIZE_TABLE] = {0};
  uint32_t phase = 0;
  uint32_t f = 440 * FREQ_INC;
  int16_t s1 = 0;
  int16_t s2 = 0;
  uint64_t sf = 0;

  for (; n < SIZE_TABLE; n++) {
    w[n] = (int16_t)(SHRT_MAX * sin(wavePhase));
    wavePhase += WAVE_INC;
  }

  while (1) {
    phase += f;
    s1 = w[phase >> SHIFT];
    s2 = w[(phase + TABLE_INC) >> SHIFT];
    sf = (uint64_t) phase & FRAC_MASK;
    sf *= s2 - s1;
    s1 += sf >> SHIFT;
    putchar(s1 & 255);
    putchar(s1 >> 8);
  } 

  return 0;
}

Only a few additional values and operations are required.

Double sampling

#define TABLE_INC (1 << SHIFT)

...

int16_t s1 = 0;
int16_t s2 = 0;

...

s1 = w[phase >> SHIFT];
s2 = w[(phase + TABLE_INC) >> SHIFT];

The first sample s1 is retrieved through simple truncation. The second sample s2 is the next index of the wavetable. If phase >> SHIFT reduces the phase to an exact index of w called N, then index N+1 can be derived by adding 1 << SHIFT to phase, then right shifting this sum by SHIFT. This avoids another modulo operation.

Fractional phase

phase and f are represented as 32 bit ints, despite only needing 12 bits to represent the indices [0, 4095], so that the 20 remaining bits of phase may represent the fractional "distance" of the phase between wavetable indices N and N+1. These bits are used to derive the more accurate sample reading from s1 and s2. They must be isolated from phase.

#define FRAC_MASK (TABLE_INC - 1)

...

uint64_t sf = 0;

...

sf = (uint64_t) phase & FRAC_MASK;

By subtracting 1 from TABLE_INC, FRAC_MASK is defined to be a series of 20 consecutive set bits. Running a bitwise AND against phase returns only the fractional part. Despite being smaller than phase, this fractional value sf is stored in 64 bits to provide it enough head room to perform the subsequent interpolation.

Interpolating

sf *= s2 - s1;
s1 += sf >> SHIFT;
putchar(s1 & 255);
putchar(s1 >> 8);

Subtracting s1 from s2 returns the span between the two samples. Multiplying this by the fractional part (this is where head room matters), then shifting it back down by SHIFT determines how much of this span should be added on to the original truncated sample s1. It's hard to envision, but this multiplication and right shifting works just like multiplying by a value [0.0, 1.0] in a float interpolation.

Conclusion

Something is seriously wrong with you if you can actually hear the difference between these two example programs, but in a modulation-heavy setting with multiple oscillators and envelopes, these theoretical accuracy concerns cease to be theoretical.

As for whether it makes sense to even use fixnums in the first place, time will tell. Worth a go though.

Follow up

Even though the audio is ultimately 16 bits, it probably makes more sense to store the wavetable values as 32 bit numbers (16.16 fixed point format) to play nicely with the phase accumulators of the same size. I noticed this while attempting basic FM between two oscs. Like everything else on this site, this page documents a learning process. It's not an authoritative article. Sorry if you read to the end only to run into the same flaws I did.