GSoC 2025 Work Product - Resampling Options for Mixxx
This GSOC project is derived from Mixxx issue#9328. The goal is to determine whether alternative interpolation algorithms result in a noticeable reduction in scratching artifacts or a latency improvement over linear interpolation, providing quantitative supporting evidence where possible.
Introduction
On a turntable, scratching is performed by moving the stylus by hand - causing it to follow grooves in the vinyl that correspond to the analog audio waveform. Mixxx offers the ability to emulate vinyl scratching on digital records, by spinning the jog wheels of a MIDI controller to emulate the motion of the turntable stylus during scratching. The action of scratching causes a sudden acceleration or deceleration in tempo of the loaded record(s).
Emulation of vinyl scratching requires low latency tempo ramping, which in turn requires resampling. Sub-optimal resampling leads to audible distortions during scratching, particularly underflows. While there is technically also a chance for phase distortion due to incorrect interpolation algorithms, it is far less noticeable.
At present, the Mixxx resampler for scratching uses a fast, handcrafted linear interpolation algorithm. Mixxx also uses the SoundTouch
and RubberBand
libraries to perform general audio time-stretching, but their performance is not optimal for fast changing speed and pitch which is the case when scratching.
Background: Analog and Digital Audio
Vibrations of the surrounding air in turn cause the human eardrum to vibrate, and generate a continuous electrical signal. In audio-engineering terms, this electrical signal represents "analog audio", and our ear represents a (biological) "audio interface", i.e. a gateway for audio to enter or exit a processing system.
In the human ear, each cochlear hair cell is tuned to a specific frequency band and converts local mechanical vibrations into discrete neural spikes, encoding amplitude over time. This results in a time-series of electrical events that the brain interprets as sound.
Periodic Sampling and Encoding
While an analog signal is represented by its amplitude as continuous function in continuous time, the digital representation of that signal is a time series of amplitude values generated by noting the value of the analog signal at fixed, discrete time intervals.
The process of generating a digital audio representation from an analog signal is termed sampling, and the length of the time interval is termed the "sample-period (its reciprocal - the sampling rate** - is more commonly used while describing digital audio). Mathematically, sampling is represented as multiplying an analog signal with an impluse train in the time-domain.
Periodic Sampling
In practice, when analog audio from a sound source enters a mic, it is converted to a continuous electrical signal. Finally, a component called the ADC (Analog to Digital Converter) records the electrical signal voltage at a fixed sampling rate such as 44.1KHz, 48Khz or 96Khz, to generate a series of amplitude values, i.e. samples. Each sample is stored as a fixed-precision floating point number.
The samples are then encoded to a standard digital format (ex. MP3, WAV, AAC, etc.) using well-known algorithms. This allows sampled audio, i.e. music records to be stored on digital hard-drives. The sample sequences are treated as logical frames for multichannel audio. An audio frame is an array containing k
copies of the current sample value, where k
is the number of output channels (mono:1, stereo:2).
Signal Reconstruction and Playback
For an audio record to be played back, there must exist a processing system that understands the original encoding scheme. For vinyl records, we have turntables connected to amplifiers. Moving the stylus along the vinyl grooves generates a continuous electrical signal, which is sent to a speaker. The speaker, being an analog device, responds to the continuous electrical signal by moving its membrane, creating air vibrations that we hear as sound.
To playback digital records, however, we need the right software. The standardization of audio formats ensures that any piece of software that adheres to certain conventions can "decode" and play a digital record. This is one key principle behind the audio playback feature of production-grade software such as VLC, Windows Media Player, Apple Music, Spotify, and even Mixxx.
A second requirement of this playback chain is the accurate reconstruction of the original analog signal from the sampled digital representation. Audio playback software communicates with a digital audio interface that contains a DAC (Digital to Analog Converter), which reconstructs the analog signal and supplies it to the speaker. DAC hardware circuits implement digital reconstruction filters. These filters perform mathematical transformations to the discrete sample sequence to recreate the original analog signal.
The Fourier and Shannon-Nyquist Theorems
The accurate reconstruction of an analog signal from a digital record is mathematically guaranteed under certain conditions. The Fourier theorem - a famous mathematical result - states that any analog signal can be represented by the sum of sinusoidal components of varying frequency and amplitude. The set of frequencies and their amplitudes gives the spectrum of the signal.
Building on this, the Shannon-Nyquist Sampling theorem states that given a digital signal, the highest resolvable frequency component (Nyquist Frequency) is half the rate at which the analog signal was originally sampled. Accordingly, to capture all frequencies audible to humans (approximately 20 Hz–20 kHz), digital audio systems typically use sampling rates of 44.1 kHz, 48 kHz, or 96 kHz, corresponding to Nyquist frequencies of 22.05 kHz, 24 kHz, and 48 kHz respectively.
The Mixxx Audio-Playback Stack
Mixxx exposes two important parameters in the Sound Hardware Preferences panel:
- Sample Rate (Hz): Determines the DAC sample rate, i.e., how frequently audio frames are converted to analog signals.
- Audio Buffer (ms): Specifies the total buffer duration, indirectly determining the size of ALSA’s ring buffer.
Mixxx Sound Hardware Preferences
These parameters together influence the quality of output sound by defining the size in frames of the ALSA ring-buffer for the selected sample-rate:
Ring Buffer Size = (Audio Buffer in s) * (Sample Rate in Hz)
Mixxx Buffering Hierarchy
As audio frames move from Mixxx to the speakers, they pass through three distinct buffering levels:
Mixxx Buffer (userspace, heap):
Holds outbound frames from the track, possibly post-processing or effects.
This buffer is typically larger and acts as the staging area for frames handed off to ALSA.
ALSA Ring Buffer (kernel-managed, DMA-mapped):
A circular buffer of configurable size (in frames), subdivided into periods. Mixxx writes to this buffer in chunks, while the DMA engine drains it in period_size
frames - a value negotiated between Mixxx and the audio-card driver at initialization. Usually, period_size = Ring Buffer Size / 2
[^1]. Each time a period is emptied, ALSA triggers a software interrupt which is handled by a userspace callback in a high-priority Mixxx thread. This callback prepares frames and refills the ring-buffer.
DAC FIFO (hardware-level):
A small first-in-first-out queue that is read at the DAC sample rate, typically 44.1 kHz, 48 kHz, or 96kHz. This hardware buffer feeds the analog reconstruction circuitry with upstream frames. For instance, a 96kHz DAC consumes one frame every 1/96k s (≈10.4 us), totaling 96000 frames per second.
Mixxx, ALSA, DAC buffers
While technically, only period_size
frames are written to the DAC between callbacks via DMA, Mixxx prepares Ring Buffer Size
frames in that duration. We can therefore simplify our model by noting that on average, Ring Buffer Size
frames are written to the DAC every callback.
Buffer Underruns
From the period_size
we compute period_time = period_size / DAC Sample Rate
. This defines a hard real-time deadline for the userspace audio callback: it must prepare at least period_size
frames within period_time
to avoid starving the DAC’s hardware FIFO. This relation also confirms that large ring-buffers and lower DAC sample-rates reduce CPU pressure.
Whether this constraint is met depends on several factors - such as the complexity of audio processing in the real-time thread, OS scheduling latency, memory pressure, etc. Since general-purpose kernels do not provide any real-time guarantees, short period_time
values can occasionally cause the callback to miss its deadline. The result is a buffer underrun, heard as a pop or glitch in playback—unacceptable in live DJ performance.
Unlike typical audio players, Mixxx performs real-time manipulation of audio—mixing, tempo changes, effects, scratching, and more - making the callback workload heavier. This demands low-latency implementations of all audio processing workflows to avoid underruns without compromising quality.
The Need for Sample-Rate Conversion
While buffer underruns occur due to DAC starvation regardless of the buffer contents, another class of audio distortions is caused arises when the DAC receives the wrong sequence of frames in its FIFO.
The input sample rate defines how many frames of a digital record represent one second of analog audio. For example, a digital record sampled from analog at 96kHz stores 96,000 frames for every second of analog sound. The DAC sample rate specifies how many outbound frames are consumed per second of real-world (wall-clock) time during playback.
- If a 96kHz record is played back on a DAC operating at 48 kHz without resampling, only 48k frames are processed each second—meaning less than a full second of the outbound audio is played back per second. This results in an unintended slowdown.
- Conversely, if the DAC sample rate exceeds the input sample rate, more than one second of the original recording is heard every second, creating the perception of sped-up and pitch-shifted playback.
Tempo ramping (sample-rate mismatch, no resampling)
In these scenarios, resampling is a corrective procedure that transforms audio sampled at the input sampling rate to match the DAC’s expected output rate.
Resampling may also be used to induce tempo ramping. For a record sampled at 96kHz with a DAC also at 96kHz, scaling tempo by a factor of 3 means we want to pass 3x frames to the DAC on each callback than we would during standard playback.
- Without resampling, writing 3x frames per callback in an attempt to increase tempo would overfill the Mixxx-ALSA buffers. In the worst case, the excess frames would be dropped. Either way, the DAC would still consume only 96k frames per second instead of the entire 3 * 96k — nullifying the intended tempo increase.
- For accurate tempo-ramping, we must represent a longer amount of track duration using less frames, while ensuring that there are enough remaining frames for accurate reconstruction. That is, every second, we need to represent 3 * 96k frames using 96k frames only.
- This resample is achieved by a procedure called digital decimation, wherein frames are actually removed from a longer sequence before being written to the DAC.
- Conversely, digital interpolation is used when increasing track tempo, whereby new frames are generated between true samples using various algorithms.
This is the reason higher samplerates are preferred in audio editing workflows: there is more room for decimation.
SoundTouch
, RubberBand
, libzita
and libsamplerate
are examples of open-source C++ libraries that implement standard algorithms to perform time-stretching on streaming data.
Contributions
mixxxPR#15081: Custom samplerates setting for recording.
This PR introduces an improved user experience in the recording preferences page. No more error messages for incompatible formats. The GUI maintains the necessary format invariants. This PR also introduces libsamplerate
to the build system along with a base resampler class using the libsamplerate src_process
API.
Key Files
- [dlgprefsrecording.cpp]
- [enginerecord.cpp]
- [recordingmanager.cpp]
mixxxPR#15160: Custom samplerates setting for broadcasting.
This PR allows users to choose custom samplerates for each broadcast profile, independently of the engine samplerate.
Key Files
- [dlgprefsbroadcast.cpp]
- [shoutconnection.cpp]
- [broadcastmanager.cpp]
Users can now pick custom samplerates for both recording and broadcasting, independent of the engine samplerate.
PR#15005: Support for low-latency scratching using the libsamplerate callback API
This PR implements a resampler class using the libsamplerate Callback API. We observed a reduction in per-buffer resampling latency from 20us to 10us - a 2x improvement over the handcrafted linear interpolator.
Key Files
- [enginebuffer.cpp]
- [enginemixer.cpp]
- [enginebufferscalesrc.cpp]
- [dlgprefsound.cpp]
Future Work
- Benchmarking the latency and CPU usage of the various resamplers during scratching.
Acknowledgements
I thank Daniel, Evelynne, Ronny, and JoergBerg, who have spent considerable time reviewing my PRs and offering assistance anytime I needed it.
References
[^1] https://0pointer.de/blog/projects/all-about-periods.html
Comments