Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Firefox browser. Chrome probably works too

Yelling into the VoIP

Internet! Because phones were too simple


Goto src

(PSA: Careful when scanning untrusted QR codes!)

Goal: understand this E2E

Goal: understand this E2E have a reference for TLAs

Near end / Far end | RX/TX | Render / Capture

Inside a telephony app

Stages

Acoustics / Psychoacoustics

Acoustics: Propagation

Acoustics: Reverb

Reverb gives humans sense of space

Acoustics: Reverb

Echo

Reverb = aural event can't be sepparated. Echo = distinct aural events.

Noise

Stereo

Human use cues from reverb, stereo effects, absorption, etc to spatialize sounds and to create "audio focus".

Moving to capture / mic

Sound capture

Analogue to digital

Beamforming

Volume control: AGC/DRC

Volume control sidequest: dB, dBFS, SPL, dBOV, LUFS

Better explanation here

From DSP to user app

Other DSP types may include noise suppression, wind noise reduction, compensation for different hearing impairments, etc

We're here: capture

Let's discuss: render

Render: out of userspace

Render: out of DSP, into the world

Render sidequest: hearing range and Nyquist theorem

Transport

Audio -> Network packets

Filtering

Frames -> Packets

  • Packets may overlap frames
  • [Lossy] compression
  • Compression depends on available bandwidth
  • Packets may be elided entirely depending on VAD (DTX)

Jitter buffering

Dealing with packet loss

E2E pipeline

Bonus: Echo

Bonus: Echo

Bonus: Transfer function

The transfer function includes: time of flight between speaker and mic, reverb, different echo paths...

Bonus: AEC models multiple paths

Bonus: AEC reconverges when transfer changes

When the path changes, the AEC needs to recalculate the transfer function. This includes accounting for 3p DSP.