Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.
For the best experience please use the latest Firefox browser. Chrome probably works too
Yelling into the VoIP
Internet! Because phones were too simple
- How humans hear
- How computers hear
- How to move voice between computers
- How to play audio
Goto src
(PSA: Careful when scanning untrusted QR codes!)
Goal: understand this E2E
Goal: understand this E2E have a reference for TLAs
Near end / Far end | RX/TX | Render / Capture
Inside a telephony app
Stages
Acoustics / Psychoacoustics
- Acoustics: how sound behaves
- Psychoacoustics: how humans think sound behaves
- Outside of computers: sound = variation in air pressure. A MECHANICAL process.
- When ploted (Pascals/Time), looks sinusoidal.
- Graphs here may or may not resemble actual audio.
Acoustics: Propagation
Acoustics: Reverb
Reverb gives humans sense of space
Echo
Reverb = aural event can't be sepparated. Echo = distinct aural events.
Noise
Stereo
Human use cues from reverb, stereo effects, absorption, etc to spatialize sounds and to create "audio focus".
Moving to capture / mic
Sound capture
Analogue to digital
Beamforming
Volume control: AGC/DRC
From DSP to user app

Other DSP types may include noise suppression, wind noise reduction, compensation for different hearing impairments, etc
We're here: capture
Let's discuss: render
Render: out of userspace
Render: out of DSP, into the world
Render sidequest: hearing range and Nyquist theorem
- Human hearing: 20 to 20KHz
- Human speech: 400 to 4KHz
- Sample rate 48KHz = Nyquist frequency 24KHz = Human hearing range
- Sample rate 8KHz = Less bandwidth, but not human hearing range
Transport
Audio -> Network packets
Frames -> Packets
- Packets may overlap frames
- [Lossy] compression
- Compression depends on available bandwidth
- Packets may be elided entirely depending on VAD (DTX)
Jitter buffering
Dealing with packet loss
Much better sources than this presentation
E2E pipeline
Bonus: Echo
Bonus: Echo
Bonus: Transfer function

The transfer function includes: time of flight between speaker and mic, reverb, different echo paths...
Bonus: AEC models multiple paths
Bonus: AEC reconverges when transfer changes

When the path changes, the AEC needs to recalculate the transfer function. This includes accounting for 3p DSP.