Lessons from Building a Browser Audio Engine


Lessons from Building a Browser Audio Engine

β€œThe Web Audio API is powerful. It’s also a minefield of browser quirks, garbage collection pauses, and β€˜why is there a 200ms delay’ mysteries.”


The Challenge: Real-Time Audio in a Browser

Building a DAW in the browser is… ambitious. Browsers aren’t designed for:

  • Sub-10ms latency
  • Deterministic audio scheduling
  • Low-level hardware access
  • Complex audio graphs with 100+ nodes

But here we are.


The Revelation: Tone.js + Web Audio API

Tone.js is a high-level wrapper around the Web Audio API. It handles:

  • Musical timing (transport, scheduling)
  • Synthesis and sampling
  • Effects and routing
  • MIDI integration

But it’s not magic. You still need to understand what’s happening under the hood.


Architecture: The Audio Graph

Bolt’s audio graph looks like this:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Tone.Destination                         β”‚
β”‚  (Master output β†’ Web Audio API β†’ AudioContext.destination)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β–²
                             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Master Bus                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  EQ        β”‚ Compressor β”‚ Limiter    β”‚ Meter        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β–²
                             β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                 β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚ Track 1 β”‚      β”‚ Track 2 β”‚      β”‚ Track N β”‚
    β”‚  β”Œβ”€β”€β”€β”  β”‚      β”‚  β”Œβ”€β”€β”€β”  β”‚      β”‚  β”Œβ”€β”€β”€β”  β”‚
    β”‚  β”‚Synthβ”‚  β”‚      β”‚  β”‚Samplerβ”‚  β”‚      β”‚  β”‚Mic  β”‚  β”‚
    β”‚  β””β”€β”€β”€β”˜  β”‚      β”‚  β””β”€β”€β”€β”˜  β”‚      β”‚  β””β”€β”€β”€β”˜  β”‚
    β”‚    β”‚    β”‚      β”‚    β”‚    β”‚      β”‚    β”‚    β”‚
    β”‚  β”Œβ”€β”΄β”€β”  β”‚      β”‚  β”Œβ”€β”΄β”€β”  β”‚      β”‚  β”Œβ”€β”΄β”€β”  β”‚
    β”‚  β”‚FX β”‚  β”‚      β”‚  β”‚FX β”‚  β”‚      β”‚  β”‚FX β”‚  β”‚
    β”‚  β””β”€β”€β”€β”˜  β”‚      β”‚  β””β”€β”€β”€β”˜  β”‚      β”‚  β””β”€β”€β”€β”˜  β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
         β”‚                 β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
    β”‚              Hardware Routing                β”‚
    β”‚  (ASIO/WDM β†’ Browser β†’ Native Audio)       β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Initialization: The AudioContext Ritual

The Web Audio API requires user interaction to start. Always.

// Bolt's audio initialization
export class AudioEngine {
  context: Tone.Context
  isInitialized: boolean = false
  
  async initialize(): Promise<void> {
    if (this.isInitialized) return
    
    // Create Tone.js context (wraps Web Audio API)
    await Tone.start()
    
    // Set sample rate (affects performance vs quality)
    this.context = new Tone.Context({
      latencyHint: 'interactive',  // 'interactive' | 'playback' | 'balanced'
      sampleRate: 48000            // 44100 | 48000 | 96000
    })
    
    Tone.setContext(this.context)
    
    // Initialize master bus
    this.masterBus = new Tone.Channel({
      volume: 0,
      pan: 0,
      solo: false,
      mute: false
    }).toDestination()
    
    // Add master effects
    this.masterEQ = new Tone.EQ3({
      low: 0,
      mid: 0,
      high: 0
    }).connect(this.masterBus)
    
    this.masterCompressor = new Tone.Compressor({
      threshold: -24,
      ratio: 12,
      attack: 0.003,
      release: 0.25
    }).connect(this.masterEQ)
    
    this.isInitialized = true
  }
  
  // Must be called after user gesture
  async resume(): Promise<void> {
    if (this.context.state === 'suspended') {
      await this.context.resume()
    }
  }
}

Critical: Always wrap Tone.start() in a user interaction handler.


Scheduling: The Transport System

DAWs need sample-accurate scheduling. Tone.js provides Transport and Draw:

export class Scheduler {
  private transport: Tone.Transport
  private scheduledEvents: Map<string, Tone.TransportEvent>
  
  constructor() {
    this.transport = Tone.Transport
    this.transport.bpm.value = 120
  }
  
  // Schedule a MIDI note
  scheduleNote(
    trackId: string,
    note: number,
    time: number,
    duration: number,
    velocity: number
  ): void {
    const synth = this.getSynth(trackId)
    
    // Schedule at specific time (in bars:quarters:sixteenths)
    this.transport.schedule((t) => {
      synth.triggerAttackRelease(
        Tone.Frequency(note, 'midi').toNote(),
        duration,
        t,
        velocity
      )
    }, time)
  }
  
  // Schedule a clip (multiple notes)
  scheduleClip(clip: Clip, startTime: number): string {
    const eventId = generateId()
    
    clip.notes.forEach(note => {
      this.transport.schedule((t) => {
        const synth = this.getSynth(clip.trackId)
        synth.triggerAttackRelease(
          note.name,
          note.duration,
          t + note.time,
          note.velocity
        )
      }, startTime + note.time)
    })
    
    return eventId
  }
  
  // Loop a section
  setLoop(start: number, end: number): void {
    this.transport.loop = true
    this.transport.loopStart = start
    this.transport.loopEnd = end
  }
  
  // Playback control
  play(): void { this.transport.start() }
  pause(): void { this.transport.pause() }
  stop(): void { this.transport.stop() }
  
  // Seek to position
  seek(position: number): void {
    this.transport.seconds = position
  }
  
  // Current position
  getPosition(): number {
    return this.transport.seconds
  }
  
  // Cleanup
  clear(): void {
    this.transport.cancel()
    this.scheduledEvents.clear()
  }
}

Instruments: Synths and Samplers

Bolt supports multiple instrument types:

// PolySynth for polyphonic instruments
const polySynth = new Tone.PolySynth(Tone.Synth, {
  oscillator: { type: 'triangle' },
  envelope: {
    attack: 0.005,
    decay: 0.1,
    sustain: 0.3,
    release: 1
  }
}).toDestination()

// Sampler for audio files
const sampler = new Tone.Sampler({
  urls: {
    C4: 'samples/piano-c4.wav',
    'D#4': 'samples/piano-ds4.wav',
    'F#4': 'samples/piano-fs4.wav',
    A4: 'samples/piano-a4.wav'
  },
  baseUrl: '/audio/',
  onload: () => console.log('Samples loaded')
}).toDestination()

// Drum machine with separate outputs
const drumKit = {
  kick: new Tone.MembraneSynth().toDestination(),
  snare: new Tone.NoiseSynth({
    noise: { type: 'white' },
    envelope: { attack: 0.001, decay: 0.2, sustain: 0 }
  }).toDestination(),
  hihat: new Tone.MetalSynth({
    envelope: { attack: 0.001, decay: 0.1, release: 0.01 },
    harmonicity: 5.1,
    modulationIndex: 32,
    resonance: 4000,
    octaves: 1.5
  }).toDestination()
}

Effects Chain: Modular Routing

export class EffectChain {
  private input: Tone.Gain
  private output: Tone.Gain
  private effects: Tone.AudioNode[]
  
  constructor() {
    this.input = new Tone.Gain(1)
    this.output = new Tone.Gain(1)
    this.effects = []
    
    // Initial connection: input -> output
    this.input.connect(this.output)
  }
  
  addEffect(effect: Tone.AudioNode, index?: number): void {
    // Disconnect current chain
    this.rebuildChain()
    
    // Insert effect at position
    if (index !== undefined) {
      this.effects.splice(index, 0, effect)
    } else {
      this.effects.push(effect)
    }
    
    // Rebuild connections
    this.connectChain()
  }
  
  removeEffect(effect: Tone.AudioNode): void {
    const index = this.effects.indexOf(effect)
    if (index === -1) return
    
    this.effects.splice(index, 1)
    effect.dispose()
    
    this.rebuildChain()
    this.connectChain()
  }
  
  private connectChain(): void {
    if (this.effects.length === 0) {
      this.input.connect(this.output)
      return
    }
    
    // Connect: input -> effect1 -> effect2 -> ... -> output
    this.input.connect(this.effects[0])
    
    for (let i = 0; i < this.effects.length - 1; i++) {
      this.effects[i].connect(this.effects[i + 1])
    }
    
    this.effects[this.effects.length - 1].connect(this.output)
  }
  
  private rebuildChain(): void {
    // Disconnect all
    this.input.disconnect()
    this.effects.forEach(effect => effect.disconnect())
  }
  
  getEntry(): Tone.Gain { return this.input }
  getExit(): Tone.Gain { return this.output }
}

The Hardware Problem: ASIO/WDM

Browsers don’t natively support professional audio interfaces. ASIO (Windows) and CoreAudio (Mac) have low latency. The Web Audio API… doesn’t.

Solutions:

  1. Native messaging host - Bridge to ASIO via local native app
  2. WASAPI - Windows’ new API, better than DirectSound
  3. Accept latency - 20-50ms is usable for production
// Hardware input handling
export async function getHardwareInputs(): Promise<MediaDeviceInfo[]> {
  await navigator.mediaDevices.getUserMedia({ audio: true })
  
  const devices = await navigator.mediaDevices.enumerateDevices()
  return devices.filter(device => device.kind === 'audioinput')
}

// Create input stream from hardware
export async function createHardwareInput(deviceId: string): Promise<Tone.UserMedia> {
  const userMedia = new Tone.UserMedia()
  await userMedia.open(deviceId)
  
  // Apply latency correction
  userMedia.latency = 0.01  // 10ms compensation
  
  return userMedia
}

// Monitor input with effects
export function createMonitor(input: Tone.UserMedia): Tone.Channel {
  const monitor = new Tone.Channel({
    volume: -Infinity,  // Start muted
    mute: true
  }).toDestination()
  
  input.connect(monitor)
  
  return monitor
}

The Garbage Collection Problem

Audio generates a lot of objects. JavaScript’s GC can cause audible glitches.

Mitigations:

  1. Object pooling - Reuse audio buffers
  2. Pre-allocation - Create nodes upfront, don’t destroy
  3. Typed arrays - Use Float32Array, not regular arrays
  4. AudioWorklet - Process audio in separate thread
// Object pool for audio buffers
class AudioBufferPool {
  private pool: AudioBuffer[] = []
  private maxSize: number
  
  constructor(maxSize: number = 50) {
    this.maxSize = maxSize
  }
  
  acquire(length: number, channels: number): AudioBuffer {
    // Find suitable buffer or create new
    const existing = this.pool.find(
      b => b.length === length && b.numberOfChannels === channels
    )
    
    if (existing) {
      this.pool = this.pool.filter(b => b !== existing)
      return existing
    }
    
    return Tone.context.createBuffer(channels, length, Tone.context.sampleRate)
  }
  
  release(buffer: AudioBuffer): void {
    if (this.pool.length < this.maxSize) {
      this.pool.push(buffer)
    }
  }
}

// Use for recording
const bufferPool = new AudioBufferPool()

export function createRecorder(duration: number): Tone.Recorder {
  const bufferSize = duration * Tone.context.sampleRate
  const buffer = bufferPool.acquire(bufferSize, 2)
  
  const recorder = new Tone.Recorder()
  
  recorder.onStop = async () => {
    const recording = await recorder.stop()
    // Process recording...
    
    // Return buffer to pool
    bufferPool.release(buffer)
  }
  
  return recorder
}

The Latency Problem

Web Audio has inherent latency from:

  • Buffer size (128-4096 samples)
  • Processing time
  • Hardware output

Measuring latency:

export async function measureLatency(): Promise<number> {
  // Create test oscillator
  const osc = new Tone.Oscillator(440, 'sine').toDestination()
  
  // Schedule tone
  const scheduledTime = Tone.now() + 0.1
  osc.start(scheduledTime).stop(scheduledTime + 0.1)
  
  // In a real implementation, you'd measure round-trip time
  // with a loopback cable or microphone
  
  // For now, estimate based on context
  const baseLatency = Tone.context.baseLatency
  const outputLatency = Tone.context.outputLatency
  
  return baseLatency + outputLatency
}

// Compensate for latency
export function scheduleWithCompensation(
  callback: () => void,
  time: number,
  latency: number
): void {
  Tone.Transport.schedule((t) => {
    callback()
  }, time - latency)
}

Performance: AudioWorklet

For heavy processing, use AudioWorklet (runs on audio thread, not main thread):

// worklet/processor.ts
class BoltProcessor extends AudioWorkletProcessor {
  process(inputs: Float32Array[][], outputs: Float32Array[][], parameters: Record<string, Float32Array>): boolean {
    const input = inputs[0]
    const output = outputs[0]
    
    if (!input || !input[0]) return true
    
    // Process each channel
    for (let channel = 0; channel < input.length; channel++) {
      const inputChannel = input[channel]
      const outputChannel = output[channel]
      
      for (let i = 0; i < inputChannel.length; i++) {
        // Example: Simple gain
        outputChannel[i] = inputChannel[i] * 0.5
      }
    }
    
    return true  // Keep processor alive
  }
}

registerProcessor('bolt-processor', BoltProcessor)

// main thread
export async function loadWorklet(context: AudioContext): Promise<void> {
  await context.audioWorklet.addModule('/worklet/processor.js')
  
  const worklet = new AudioWorkletNode(context, 'bolt-processor')
  
  // Connect to graph
  someSource.connect(worklet)
  worklet.connect(Tone.Destination)
}

Pro Tips for Browser Audio

  1. Start with low buffer size - 256 samples for low latency
  2. Watch the node count - Too many = CPU spikes
  3. Use Offline rendering - For exports, not real-time
  4. Test on target devices - Mobile audio is a different beast
  5. Profile the audio thread - Chrome DevTools has profiler

War Stories

The 200ms Mystery Delay

Spent a week tracking down latency. Turns out Chrome was adding a hidden MediaElementAudioSourceNode with default buffer size. Fixed by explicitly setting latencyHint: 'interactive'.

The Memory Leak

Synths weren’t being disposed. Each track create/destroy leaked ~2MB. Fixed by calling .dispose() on all Tone.js objects.

The Mobile Safari Bug

Safari iOS silently fails on AudioContext resume if not triggered by user gesture. Even programmatic clicks don’t count. Must be actual user tap.


What’s Next

  • WASM DSP - Run C++ audio code in browser
  • WebCodecs - Better codec support for imports
  • WebTransport - Low-latency network audio streaming
  • VST bridging - WebAssembly VST plugins

Cleetus Speaks

β€œbrother b0gie, you made a whole MUSIC STUDIO in the BROWSER??

and it doesn’t even LAG??

wait… so i can make beats on my PHONE now??

but does it have a β€˜make it slap’ button??

or a β€˜add spice’ slider??

no?? okay i’ll settle for just being able to make music anywhere

#ToneJS #WebAudio #BrowserDAW #Subject734MobileProducer”


The Web Audio API is finicky, quirky, and occasionally maddening. But when it works? You have a professional DAW running in a tab. Worth it.

β€” b0gie