After the questionable success of my questionable experiment with Scheme and sound, I've found myself thinking about how to best abstract some inherently stateful audio concepts into a world of maps and folds. So far I've played with
uint32
. Representing this in Chicken means fighting against one of the language's mathematical advantages: arbitrarily sized numbers.u8vector
, but something more primitive is required to work on sub-blocks of this memory.This code explores these topics. It's too local to be optimal for actual DSP, but this writeup is more about bit mangling in Chicken.
(import (srfi 1) (srfi 4) (chicken memory) (chicken bitwise) (chicken type))
(define-syntax λ (syntax-rules () ((_ . ω) (lambda . ω))))
(define-syntax ∃ (syntax-rules () ((_ . α) (let* . α))))
(define-syntax ? (syntax-rules () ((_ . α) (if . α))))
(define-syntax ← (syntax-rules (▽)
((_ (▽ α ...) ω ...) (begin (define α ω) ...))
((_ ((α ω ...) ...)) (begin (define α ω ...) ...))
((_ . α) (define . α))))
(← ((□ display) (⇐ foldl) (π 3.141592653589793) (ι iota) (∀ map) (I identity)))
(← ((∀∀ for-each)))
(define-type phase pointer)
(define-type wave pointer)
(define-type buffer pointer)
(: phase (fixnum -> phase))
(← (phase rate)
(∃ ((ω (allocate 4)))
(pointer-u32-set! ω 0)
(tag-pointer ω (/ 4294967296 rate))))
(: θ+ (phase number -> phase))
(← (θ+ θ hz)
(∃ ((n (* hz (pointer-tag θ)))
(h (floor (+ n (pointer-u32-ref θ)))))
(pointer-u32-set! θ h) θ))
(: sine-wave (fixnum -> wave))
(← (sine-wave n)
(letrec* ((ref-shift (* -1 (- 32 n)))
(size (arithmetic-shift 1 n))
(bytes 2)
(bytes-size (* bytes size))
(ε (/ (* 2 π) size))
(ω (allocate bytes-size))
(▽ (λ (m θ)
(? (= m bytes-size) (tag-pointer ω ref-shift)
(∃ ((s (inexact->exact (floor (* 32767 (sin θ))))))
(pointer-s16-set! (pointer+ ω m) s)
(▽ (+ m bytes) (+ θ ε)))))))
(▽ 0 0)))
(: WAVE-4096 wave)
(← WAVE-4096 (sine-wave 12))
(: sine (phase wave --> fixnum))
(← (sine θ ω)
(∃ ((shift (pointer-tag ω))
(bytes 2)
(n (* bytes (arithmetic-shift (pointer-u32-ref θ) shift))))
(pointer-s16-ref (pointer+ ω n))))
(: buffer (fixnum pointer --> buffer))
(← (buffer n ω) (tag-pointer ω n))
(: buffer+ (fixnum buffer --> buffer))
(← (buffer+ n ω) (tag-pointer (pointer+ ω n) (- (pointer-tag ω) n)))
(: buffer-∅? (buffer --> boolean))
(← (buffer-∅? ω) (= (pointer-tag ω) 0))
(: buffer-write-byte (buffer fixnum -> buffer))
(← (buffer-write-byte ω n) (pointer-u8-set! ω n) (buffer+ 1 ω))
(: buffer-write-bytes ((list-of fixnum) buffer -> buffer))
(← (buffer-write-bytes ns ω) (⇐ buffer-write-byte ω ns))
(: buffer-write-16-le (fixnum buffer -> buffer))
(← (buffer-write-16-le n ω)
(buffer-write-bytes `(,(bitwise-and n 255) ,(arithmetic-shift n -8)) ω))
(: buffer⇒ (('a -> 'a) ('a buffer -> buffer) 'a buffer -> 'a))
(← (buffer⇒ f g α ω)
(letrec ((▽ (λ (a w) (? (buffer-∅? w) a (∃ ((A (f a))) (▽ A (g A w)))))))
(▽ α ω)))
(: fill-sine (number phase buffer -> phase))
(← (fill-sine hz θ ω)
(buffer⇒ (λ (h) (θ+ h hz))
(λ (h w) (buffer-write-16-le (sine h WAVE-4096) w))
θ ω))
(: fill-all-sine (number phase (list-of buffer) -> phase))
(← (fill-all-sine hz θ ω) (⇐ (λ (h w) (fill-sine hz h w)) θ ω))
(: buffer-blocks (fixnum fixnum -> (list-of buffer)))
(← (buffer-blocks n size)
(∃ ((ω (allocate (* n size))))
(∀ (λ (m) (buffer size (pointer+ ω m))) (ι n 0 size))))
(: print-8 (buffer -> buffer))
(← (print-8 ω) (□ (pointer-u8-ref ω)) (□ " ") (buffer+ 1 ω))
(: print-16 (buffer -> buffer))
(← (print-16 ω) (□ (pointer-s16-ref ω)) (□ " ") (buffer+ 2 ω))
(: print-16-le (buffer -> buffer))
(← (print-16-le ω)
(∃ ((n (+ (pointer-u8-ref (pointer+ ω 0))
(arithmetic-shift (pointer-u8-ref (pointer+ ω 1)) 8))))
(□ (? (> n 32767) (- n 65536) n)) (□ " ") (buffer+ 2 ω)))
(: print-buffer ((buffer -> buffer) buffer -> void))
(← (print-buffer f ω)
(□ "( ") (buffer⇒ I (λ (_ w) (f w)) #f ω) (□ ")") (newline))
(: print-blocks ((buffer -> buffer) (list-of buffer) -> void))
(← (print-blocks f ω) (∀∀ (λ (w) (print-buffer f w)) ω))
Quite a bit of code, despite my quixotic attempt to turn it into APL.
The value of a wave at any sampling point in time can be derived from its current phase. Since waves are periodic, the phase accumulator can also be cyclic instead of growing indefinitely. This can obviously be enforced with modulo operations, but integer overflow works too. If phase θ is a fixed size integer, like uint32
, then its phase increments by a given frequency at
(hz × ((1 + UINT_MAX)÷rate)) + θ
where "hz" is the signal frequency and "rate" is the sampling rate. The overflow of this int represents a completed wave cycle. So for a sampling rate of 16 and a frequency of 2, θ will overflow twice by the 17th sample.
Since overflow is undesirable in just about every other context, it's rather difficult to make Scheme do it. Normal numbers will grow indefinitely, and typed vectors will throw errors.
(let ((xs (u32vector 4294967295))) (u32vector-set! xs 0 (+ 1 (u32vector-ref xs 0))))
Error: (u32vector-set!) out of range
It's a bit of a pain for a single value, but the only solution I've found involves C pointers from Chicken's memory module. I literally (allocate)
4 bytes—which have to be freed later—and then write a number to this address with (pointer-u32-set!)
. It's overflowed by the time it's read back with (pointer-u32-ref)
.
(let ((phase (allocate 4))) (pointer-u32-set! phase 4294967296) (pointer-u32-ref phase))
0
Pointers are very primitive, but Chicken allows one to tag them with arbitrary Scheme objects thanks to (tag-pointer)
. This value is retrieved with (pointer-tag)
. Since phase depends on the sampling rate, it makes sense to tag the pointer with (1+UINT_MAX)÷rate
. Then any incrementation of this phase only requires a frequency multiplier. Therefore the construction of any phase pointer should take the sample rate as its sole argument.
(define-type phase pointer)
(: phase (fixnum -> phase))
(← (phase rate)
(∃ ((ω (allocate 4)))
(pointer-u32-set! ω 0)
(tag-pointer ω (/ 4294967296 rate))))
(: θ+ (phase number -> phase))
(← (θ+ θ hz)
(∃ ((n (* hz (pointer-tag θ)))
(h (floor (+ n (pointer-u32-ref θ)))))
(pointer-u32-set! θ h) θ))
The increment operation (θ+)
returns the full pointer rather than the numeric phase value itself in order to keep the bounds limited to UINT_MAX
. It's not quite typesafe, but it's better than nothing.
Even if the integer values themselves aren't useful yet, you can witness their periodic nature.
(let ((p (phase 4))) (map (lambda (_) (θ+ p 1) (pointer-u32-ref p)) (iota 6)))
(1073741824 2147483648 3221225472 0 1073741824 2147483648)
4,294,967,296 divides perfectly by the sample rate of 4hz, but some drift will occur with common sample rates like 44,100hz and 48,000hz. Whether this is an audible problem remains to be seen—er, heard. The increment value held at (pointer-tag)
is actually a full Scheme rational number, like 33554432/375
for 48,000hz. This can help with accuracy, but some rounding happens when the result is stored back in the pointer.
The sine function can be expensive to compute on the fly, so one cycle is calculated and stored in a lookup table. Any Scheme list or vector can serve this purpose, but we might as well make another pointer, since its values will be contiguous in memory, and (pointer-tag)
can be put to use once again.
The index of a wavetable can be retrieved from an integer phase as long as the table size is a power of two. A large phase can be reduced to any resolution through right shifts. Shifting by 30 "snaps" a phase to its quadrant. Shifting by less extreme values provides indices for common table sizes without any need for modulo.
(let ((p (phase 4))) (map (lambda (_) (θ+ p 1) (arithmetic-shift (pointer-u32-ref p) -30)) (iota 6)))
(1 2 3 0 1 2)
Therefore, the constructor for a wavetable should take an exponent as its sole argument, rather than an arbitrary table size. The (pointer-tag)
of this table should likewise store a negative value representing the right shift necessary to safely access the contents with a 32 bit phase. Any given value within the table will be 16 bit, or two bytes. This is a common audio format. Production code is better off with 32 bit values for further modulation. The constructor has many lines, but it's not actually that complicated.
(define-type wave pointer)
(: sine-wave (fixnum -> wave))
(← (sine-wave n)
(letrec* ((ref-shift (* -1 (- 32 n)))
(size (arithmetic-shift 1 n))
(bytes 2)
(bytes-size (* bytes size))
(ε (/ (* 2 π) size))
(ω (allocate bytes-size))
(▽ (λ (m θ)
(? (= m bytes-size) (tag-pointer ω ref-shift)
(∃ ((s (inexact->exact (floor (* 32767 (sin θ))))))
(pointer-s16-set! (pointer+ ω m) s)
(▽ (+ m bytes) (+ θ ε)))))))
(▽ 0 0)))
(: WAVE-4096 wave)
(← WAVE-4096 (sine-wave 12))
(: WAVE-2048 wave)
(← WAVE-2048 (sine-wave 11))
A table can process a phase value regardless of its size.
(: sine (phase wave --> fixnum))
(← (sine θ ω)
(∃ ((shift (pointer-tag ω))
(bytes 2)
(n (* bytes (arithmetic-shift (pointer-u32-ref θ) shift))))
(pointer-s16-ref (pointer+ ω n))))
Observe.
(let ((p (phase 4))) (map (lambda (_) (θ+ p 1) (sine p WAVE-4096)) (iota 6)))
(32767 -1 -32767 0 32767 -1)
(let ((p (phase 4))) (map (lambda (_) (θ+ p 1) (sine p WAVE-2048)) (iota 6)))
(32767 0 -32767 0 32767 0)
The signal accuracy diminishes as the wavetable shrinks. This example is also lofi enough to tweak if you wish to witness the Nyquist Theorem in action. Reduce the phase and you'll see what I mean.
The program should write audio data to the soundcard in relatively large batches of bytes. My system seems to like 960 samples, which is 3,840 bytes of 16 bit stereo data. In a loop that collects user input, processes 960 samples, writes, and repeats, there are 50 chances per second for the user to turn a knob, press a key, or whatever. That's actually not great for realtime performance. You can definitely feel a delay when playing on a MIDI piano. The program can still write every 960 samples, but the buffer should be divided further to wait for additional input.
As you might imagine, this can be accomplished with pointers. Let the audio output see one pointer to a contiguous 3,840 byte stretch of memory, but have the user input write to n evenly-spaced pointers across the same memory. The main output loop could fill the constituent memory blocks one by one, then dump the entire audio when they are all iterated through.
As you might imagine, it is not enough to use raw pointers. They can be tagged to make the notion of a write-only buffer easier to work with. This time around the (pointer-tag)
reflects the remaining bytes left to write to a block of memory.
(define-type buffer pointer)
(: buffer (fixnum pointer --> buffer))
(← (buffer n ω) (tag-pointer ω n))
(: buffer+ (fixnum buffer --> buffer))
(← (buffer+ n ω) (tag-pointer (pointer+ ω n) (- (pointer-tag ω) n)))
(: buffer-∅? (buffer --> boolean))
(← (buffer-∅? ω) (= (pointer-tag ω) 0))
In action.
(define xs (buffer 2 (allocate 2)))
xs
#<tagged pointer 2 60000384a540>
(buffer+ 1 xs)
#<tagged pointer 1 60000384a541>
(buffer+ 2 xs)
#<tagged pointer 0 60000384a542>
(buffer-∅? (buffer+ 2 xs))
#t
Writing a byte to a buffer returns a new buffer pointer with one less length. Despite the stateful nature of this process, it behaves not unlike (cdr)
. A function of buffer a → buffer
is ripe for folding. A fold is ripe for writing many bytes, including in little-endian format, which is standard for WAV audio.
(: buffer-write-byte (buffer fixnum -> buffer))
(← (buffer-write-byte ω n) (pointer-u8-set! ω n) (buffer+ 1 ω))
(: buffer-write-bytes ((list-of fixnum) buffer -> buffer))
(← (buffer-write-bytes ns ω) (⇐ buffer-write-byte ω ns))
(: buffer-write-16-le (fixnum buffer -> buffer))
(← (buffer-write-16-le n ω)
(buffer-write-bytes `(,(bitwise-and n 255) ,(arithmetic-shift n -8)) ω))
This process can be generalized into a folding function (buffer⇒)
. It takes two higher-order functions f
and g
, along with the expected initial accumulator α
and buffer ω
.
f
is the usual function expected in a fold: one that modifies the accumulator and returns it for further processing in the next iteration.g
describes how the output of f
should be written to the buffer, then returns a new buffer with that many bytes subtracted from its length.Defined.
(: buffer⇒ (('a -> 'a) ('a buffer -> buffer) 'a buffer -> 'a))
(← (buffer⇒ f g α ω)
(letrec ((▽ (λ (a w) (? (buffer-∅? w) a (∃ ((A (f a))) (▽ A (g A w)))))))
(▽ α ω)))
Both writing sines and printing them is implemented in terms of this fold.
(: fill-sine (number phase buffer -> phase))
(← (fill-sine hz θ ω)
(buffer⇒ (λ (h) (θ+ h hz))
(λ (h w) (buffer-write-16-le (sine h WAVE-4096) w))
θ ω))
Here the accumulator is a phase object. f
adds the frequency increment to the phase, while g
renders (sine θ)
to the buffer in 16 bit little-endian.
(: print-8 (buffer -> buffer))
(← (print-8 ω) (□ (pointer-u8-ref ω)) (□ " ") (buffer+ 1 ω))
(: print-16-le (buffer -> buffer))
(← (print-16-le ω)
(∃ ((n (+ (pointer-u8-ref (pointer+ ω 0))
(arithmetic-shift (pointer-u8-ref (pointer+ ω 1)) 8))))
(□ (? (> n 32767) (- n 65536) n)) (□ " ") (buffer+ 2 ω)))
(: print-buffer ((buffer -> buffer) buffer -> void))
(← (print-buffer f ω)
(□ "( ") (buffer⇒ I (λ (_ w) (f w)) #f ω) (□ ")") (newline))
This time around f
is an identity of false, since there is no accumulator to modify. g
simply prints the buffer contents without writing to them.
(define xs (buffer 32 (allocate 32)))
xs
#<tagged pointer 32 6000019702c0>
(print-buffer print-16-le xs)
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
(fill-sine 1 (phase 16) xs)
#<tagged pointer 268435456 600001b64000>
(print-buffer print-16-le xs)
( 12539 23169 30272 32767 30272 23169 12539 -1 -12540 -23170 -30273 -32767 -30273 -23170 -12540 0 )
(print-buffer print-8 xs)
( 251 48 129 90 64 118 255 127 64 118 129 90 251 48 255 255 4 207 126 165 191 137 1 128 191 137 126 165 4 207 0 0 )
It's trivial to expand the buffer folding metaphor to multiple blocks.
(: buffer-blocks (fixnum fixnum -> (list-of buffer)))
(← (buffer-blocks n size)
(∃ ((ω (allocate (* n size))))
(∀ (λ (m) (buffer size (pointer+ ω m))) (ι n 0 size))))
(: fill-all-sine (number phase (list-of buffer) -> phase))
(← (fill-all-sine hz θ ω) (⇐ (λ (h w) (fill-sine hz h w)) θ ω))
(: print-blocks ((buffer -> buffer) (list-of buffer) -> void))
(← (print-blocks f ω) (∀∀ (λ (w) (print-buffer f w)) ω))
Observe the way the phase drifts across buffer blocks at 2hz frequency and 17hz sample rate.
(define xs (buffer-blocks 8 32))
(print-blocks print-16-le xs)
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
( 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )
(fill-all-sine 2 (phase 17) xs)
#<tagged pointer 4294967296/17 6000035b5d20>
(print-blocks print-16-le xs)
( 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 17274 -5998 -26138 -32629 -22080 )
( -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 17274 -5998 -26138 -32629 )
( -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 17274 -5998 -26138 )
( -32629 -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 17274 -5998 )
( -26138 -32629 -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 17274 )
( -5998 -26138 -32629 -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 31525 )
( 17274 -5998 -26138 -32629 -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 29313 )
( 31525 17274 -5998 -26138 -32629 -22080 -51 22042 32623 26168 6047 -17233 -31513 -29336 -11840 11792 )
Something like (fill-all-sine)
wouldn't make sense in production code, because the program would be polling for additional info after each block fill.
We're definitely cooking now though. Very low-level and not a for
loop in sight. Some of this code still doesn't make musical sense, but if these simple examples were expanded to batch certain operations, they might be half worthwhile.
The performance of raw pointers is actually quite poor. At least in the "functional" way I've written them, where every byte written to a pointer returns a new one. This could be creating unnecessary objects, or the FFI overhead is just too great—the profiler didn't really say.
Things improved when I packed 4 samples into one (pointer-u64-set!)
and incremented the buffer by 8, but they got even better when I ditched pointers entirely and allocated a u64vector instead. If invoked as
(make-u64vector n 0 #f #f)
it will remain static in memory until freed with (free-number-vector)
.
It required more boilerplate, but I was able to simulate blocks over this buffer by creating a list of buffer-iterating lambdas, each partially-applied with the buffer and a [n,m) range representing the individual slice. So there's still no user book-keeping of array indices.
The entire buffer can be passed as a pointer the C audio function with (make-locative)
.
Whether this still has to be 64 bits to get the most performance remains to be seen, but this is also largely transparent to the user, so I'm fine with that too.
I think the overall point of this writeup still makes sense.