GSoC 2022 Work Product - Pitch Shift effect and Group Delay handling
Disclaimer: The blog post primarily serves as the documentation for the Google Summer of Code 2022 project: "Pitch Shift effect and Group Delay handling". Thus, it contains a lot more detailed description, than the other Mixxx blog posts.
The project implements the Pitch Shift effect for the Mixxx DJ Software Application. The Pitch Shift effect raises or lowers the original pitch of an audio signal1. Thanks to the long working period, the project was expanded with the implementation of the Group Delay handling for the effect chain.
Before the effect was implemented, the pitch could be changed using the deck's rate slider separately only. This imposes significant restrictions on the ways other effects can interact with the sound. In this project, the new Pitch Shift effect is introduced in the built-in effects, which can be used in the effect chain. It implements the wish Add a Transpose / Pitch Shift effect for the Mixxx software. The effect has to work with the effect chain API. With that, the other extensional options can be used. Primarily, it allows for a user to use a much wider range, than the Pitch Shifter for the deck player. Due to the produced latency based on the pitch processing, the production delay has to be handled for the Dry/Wet or Dry+Wet mode to ensure that the original (dry) and processed (wet) signals overlap.
List of terms from the field of music, sound processing and development in general:
- Pitch is a perceptual property of sounds that allows their ordering on a frequency-related scale. The pitch is the quality that makes it possible to judge sounds as "higher" and "lower" in the sense associated with musical melodies2.
- In music theory, a scale is any set of musical notes ordered by fundamental frequency or pitch. The scale ordered by increasing pitch is an ascending scale, and a scale ordered by decreasing pitch is a descending scale3.
- In music, an octave is an interval between one musical pitch and another with double its frequency4.
- In music theory, an interval is a difference in pitch between two sounds. In western scales, intervals are most commonly differences between whole tones and semitones5.
- A semitone is a distance in pitch between a note and the very next note, higher or lower. It is the smallest interval in most western scales6.
- Chromatic scale
- A chromatic scale is a set of twelve pitches used in tonal music, with notes separated by the interval of a semitone7.
- Dry and wet signals
- Dry sound signals refer to the raw or unprocessed sounds that usually come from a direct recording. On the other hand, wet sounds refer to the processed sound/signal8.
- Ring buffer (circular buffer)
- In computer science, a circular buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams9.
- Audio buffer
- An audio buffer holds a fixed size amount of sampled audio data. The audio buffer size determines the time allowed for the computer to process the audio data. Thus, it also determines latency.
- In the audio world, “latency” is another word for “delay”10. The latency (time) of an audio system refers to the time difference from the moment a signal is fed into the system, to the moment it appears at the output11. For example, audio latency is when there is a noticeable delay between the sound being played and the moment it reaches the speakers10. Depending on the application, such a delay can have various effects. Usually, the aim is to achieve the lowest possible latency11.
- Sample Rate
- Audio sampling is the process of transforming a musical source into a digital file. Digital audio recording does this by taking samples of the audio source along the soundwaves at regular intervals. The more samples are taken - known as the ‘sample rate’ - the more closely the final digital file will resemble the original. A higher sample rate tends to deliver a better-quality audio reproduction12.
Pitch Shift effect
The Pitch Shift effect main algorithm is implemented
PitchShiftEffect class. The implementation uses the widely known
audio time-stretching and pitch-shifting library RubberBand.
The implementation adheres to the “push model”. That means, that
the input audio samples are offered to the RubberBand library API directly.
Instead of the main Pitch Shifter for each deck player, which
has a limited range (only 7 semitones up and down in musical terms,
which means not even an octave), the independent Pitch Shift effect offers
to work in the range of ± 2 octaves (± 24 semitones).
The Pitch Shift effect has the following options:
- Pitch knob
- Range knob
- Semitones mode
- Formant preserving
The Pitch knob changes the pitch of a track up or down. For the default middle position, the track pitch is unchanged. The Range knob ensures setting the range of the Pitch knob. The Pitch knob based on the range setting can work from zero range to 2 octaves range. These two knobs work similarly to the real professional Pioneer DJM-900NX2 mixer which is widely used in clubs for live DJ mixing. Then, the Semitones mode toggle was added. This option sets the scale of the Pitch knob. The knob can work in two modes: Continuous or Semitones mode. In the Semitones mode, the pitch is changed based on the chromatic scale in a musical way. Otherwise, the pitch is processed continuously, which is a default approach of the used RubberBand library. By default, the Semitones mode is on. As last, the Formant preserving option was added which uses the RubberBand API namesake option. It preserves the resonant frequencies (formants) of the human vocal tract and other instruments (compensates for “chipmunk” or “growling” voices).
Pitch Shift effect in the Mixxx software
Pitch Shift effect in the effect chain
Group Delay handling
As a project extension, the Group Delay handling of the effect chain
was implemented. Based on the Pitch Shift effect processing,
using the RubberBand library, the effect produces some amount of latency.
Due to that, if we would like to play the original unprocessed signal
and the processed one together using the Dry/Wet or Dry+Wet mode, the two
audio signals will not overlap because of latency. With that, the common
audio processing approach is to delay the original signal by the amount
of latency to overlap the signals. Based on the Mixxx effect chain API
EngineEffectChain), the group delay latency handling was implemented
for the whole effect chain and works for the total produced latency
from the effect chain used effects. The main algorithm of the Group Delay
handling for the effect chain is implemented in the
The implemented APIs take the group delay with the input signal and return
the delayed signal using the inner data structures. For group delay changes,
it performs cross-fading to avoid unwanted clicks in the output audio signal.
The implemented API was used and built into the implementation
of the effect chain. Now, the sum of the latency reported by effects
is processed. As was mentioned, lastly, the Group Delay reporting
from the effects was implemented using the Mixxx API structures for effects
Based on the implementation of the
EngineEffectsDelay, it was soon figured out,
that the custom optimized data structure for the Group Delay handling
should be created. A common approach for working with the audio signal stream
is to use the ring buffer data structure. However, based on the specification
of Group Delay handling and requirements on the buffer data structure,
the classic widely known implementation is not appropriate for use. So,
the new improved and optimized variant of the ring buffer data structure
was created specifically for the Group Delay handling use case.
The implementation can be found in the
RingDelayBuffer was implemented, the new optimized data structure
was introduced in the
EngineEffectsDelay for Group Delay handling. With that,
the performance has highly improved based on the benchmark results comparison.
The benchmarking process and results will be described in detail
in the "Testing and benchmarking" chapter later.
Pitch Shift effect improvement
At the latest, it was started the “pull model” implementation for decreasing the Pitch Shift effect latency in the GSoC period. In the new model implementation, the RubberBand API requires the amount of input samples and this amount of samples is passed. The main difference between the usage of these two implementations in the Mixxx software is which tradeoff has to be made. When the “push model” implementation is used, the input data samples have not to be prefilled between the processing but the audio dropouts can occur. On the other hand, the “pull model” implementation with correctly set structure sizes avoids dropouts, but the input data samples have to be prefilled before the processing, and a delay will be produced before the output is produced. To compare the differences in group delay between the “push model” and “pull model” implementation, a measurement was made. The results are captured in the following two graphs.
Based on the first graph, the “pull model” implementation with the size of the circular buffer of 4096 frames produces a lower latency than the “push model” even with the latency produced before the output data. Simultaneously, the delay between dynamic Pitch changes is in a smaller range.
Push and pull model group delay comparison
In the second graph, the group delay for the unchanged pitch was measured. The differences in the measurements are, that the previous pitch values before the unchanged pitch settings are different. With the results of the last measurement, the “pull model” implementation is more stable and produces a similar, almost constant delay after dynamic pitch changes for the same pitch setting.
Push and pull model group delay comparison for unchanged pitch
The results of the measurements clearly show, that despite the “pull model” implementation for the Pitch Shift effect in the effect chain is not optimal, it should be preferred over the “push model” implementation.
Testing and benchmarking
With the implementation of the
EngineEffectsDelay for Group Delay handling
RingDelayBuffer as an optimized data structure for the same use case,
the tests were included with the use of the GoogleTest framework. Basically, the common situations were
tested, then extreme cases and cases that are not allowed but have to be handled
for the release builds. On the basis that these implemented structures
are critical from the point of view of performance, with tests, the benchmarks
were created with the use of the Google Benchmark. Based on the results of benchmarks,
the used functions and algorithms were compared. After both of the mentioned
structures were implemented, tested and optimized, the
was introduced in the
EngineEffectsDelay as an inner structure
for the Group Delay handling. Based on the changes and use of optimized
data structure the performance has highly improved. The performance differences
are shown in the following benchmarks results taken over
from the Ubuntu GitHub CI’s results.
Run on (2 X 2593.91 MHz CPU s)
- L1 Data 32 KiB (x2)
- L1 Instruction 32 KiB (x2)
- L2 Unified 1024 KiB (x2)
- L3 Unified 36608 KiB (x1)
Run on the same system as above.
The video with a couple of examples of Pitch Shift effect possible usage
Several problems arose during the coding period.
Despite the medium project size, it was needed to spend much more time
working on the project based on the issues. I think the biggest challenge
in this project was exactly the implementation of the Pitch Shift effect
using the RubberBand library. The issue occurred soon, that the effect chain
offers the effect to work only with the fixed size audio chunks. Based on that,
it is not possible to require an amount of input audio data as needed
for the RubberBand library “pull model” implementation.
The SoundTouch library for Pitch Shifting
was tested too but produced results with worse audio quality and with the same
amount of delay. Based on all the discussions, the RubberBand library has shown
as the best option for the effect, despite the issues which are associated
with it. After that, the Mixxx application did not have implemented the effects
delay handler for the effect chain, so, the implementation of this structure
was automatically needed. Based on that, the goals of the project were changed.
The original proposal contained the Pitch Shift effect, and with that
as a project extension to the project requirements of the Mixxx organization,
the Auto-tune effect was proposed. After consideration, the Zulip chat survey
for other Mixxx developers and users was created to be able to vote for possible
project extensions. Based on the survey results, the project extension goal
was changed to the implementation and optimization of the Group Delay handling
for the effect chain to improve the performance of the Pitch Shift effect
for the Dry/Wet and Dry+Wet modes. As the last challenge, I would like to mention
the usage of the
std::span from the standard library which is supported
by C++20. Because the Mixxx organization adheres to its own
Minimum requirements policy
for the Ubuntu LTS, the mixxx#4810
and the mixxx#4852 pull requests
could be merged after the official Ubuntu release was announced in the middle
of August due to support of C++20.
Pull requests and issues
mixxx#4775 - PitchShiftEffect: add independent effect
The PR adds an independent effect to Mixxx's built-in effects. The implementation uses the RubberBand library for changing a pitch of an input track. The effect works in real-time mode and adheres to the “push model” implementation. That means that the input data are offered to the RubberBand instead of that the library requires the amount of input data.
mixxx#4810 - EngineEffectsDelay: effect chain delay handling
This PR adds the structure for the Group Delay handling of the effect chain. Based on that, some effects can produce latency due to their inner processing. The latency has to be handled for the Dry/Wet and Dry+Wet modes that the dry and wet signals overlapped. The structure for delay reporting from the effects into the effect chain was implemented. With that, the dry signal delaying to overlap with the wet signal was implemented as well. Because it is a critical part of the application engine performance, the tests and benchmarks were included in the development.
In this PR, the
std::span was newly introduced into the Mixxx software code
with the design proposal and cooperation of my mentor. The util for working
with spans was implemented, so other developers can easily work with spans
directly from the custom Mixxx data structures. With that, the Mixxx code
is being upgraded using the C++20 standard.
mixxx#4848 - Fix EngineDelay and EngineFilterDelay modulo calculation documentation
Based on the code changes in the
EngineEffectsDelay and discussion
with my mentor, the explanation commentary was added to two other Mixxx
structures working on a quite similar principle.
mixxx#4852 - RingDelayBuffer: ring buffer for delay handling
During the creation of the
EngineEffectsDelay for the Group Delay handling
of the effect chain, it was suggested to create an optimized data structure
for the inner processing based on the ring buffer. This widely-known
signal processing structure was improved and optimized specifically
for the use case with handling of delay. Again, tests and benchmarks
were created for the
RingDelayBuffer and based on benchmarks results the used
data copy functions were compared.
vcpkg#48 - [rubberband] add overlaid rubberband v3
During the coding period, the new RubberBand library release v3.0.0 was announced. Based on the implementation for adding RubberBand v2.0.2 directly into the microsoft / vcpkg repository by the Mixxx organization developer, the RubberBand v3.0.0 was added to the overlaid ports in the Mixxx fork of the original repository.
mixxx#4869 - EngineFilterDelay: clamp wrong delay values
While working on mixxx#4810,
I encountered a bug in the
EngineFilterDelay structure: The structure
works in a similar way but for a little different use case.
Newly the unacceptably huge delay values are clamped in the setter, so,
based on the inner calculation the structure will not produce absolutely wrong
output. The PR was merged the same day as its creation.
mixxx#4898 - PitchShiftEffect: decrease and report latency
Status: Draft (WIP), last GSoC commit: 146f104
In this draft PR was worked as another project extension. The implemented “push” way model is extended into the “pull” model instead. The new approach decreases the effect latency and this latency is reported in the effect chain delay handler. This PR is still a "Work In Progress". As the last work done the latency measurements were performed for several implementations and for different pitch settings. The measured data was plotted for demonstration. The new implementation was accepted and the PR will be done in the non-GSoC time as a future Mixxx contributor. With the new implementation, the Mixxx circular buffer data structure was improved and optimized for performance. So, it remains to finish the pull implementation by setting the right size of the input ring buffer. Eventually, implement the input ring buffer size depending on the range that was set. As the last thing, the valid delay value propagation for the effect will be finished.
mixxx#4901 - PitchShiftEffect: extend effect options
The PR extends options of the Pitch Shift effect. The Range knob is added to the setting of the range of the Pitch knob. These two knobs work similarly to the real professional Pioneer DJM-900NX2 mixer which is widely used in clubs for live DJ mixing. With that, the Semitones mode toggle was added for changing the scale of the Pitch knob. By default, this toggle is on, and the Pitch knob works in the Semitones mode. In musical terminology, the pitch is changed based on the semitone chromatic scale. If the toggle is off, the Pitch knob works in the Continuous mode, which is also the default in the RubberBand library. At last, the Formant preserving option was added which works with the namesake RubberBand library option. It preserves the resonant frequencies (formants) of the human vocal tract and other instruments (compensates for “chipmunk” or “growling” voices). With the PR, the new function for the calculation of the Sign function was added to the Mixxx util for math operations.
mixxx#10827 - Improve buffers size function const-correctness
This PR improves Mixxx’s buffers data structures by using the C++ constant expressions for the size function.
mixxx#10832 - EngineEffect: invalid engine parameters handed over into an effect
During the work on the Pitch Shift effect, it was figured out, that the actual parameter settings are not propagated into the effects. The maximum possible values are used instead and based on that, some newly added effects can work wrong, based on the invalid values for sample rate or size of the buffer, for example.
mixxx#10835 - EngineBufferScaleRubberBand: remove unused include
The unused include was removed from the implemented Mixxx structure.
mixxx#10840 - EngineEffectsDelay: introduce ring delay buffer
Status: Open (WIP), last (non-failing) GSoC commit: 0c01e34
The implemented optimized ring buffer data structure for delay handling is built into the effect chain handling structure. With the use of the new data structure, the delay handling performance is highly improved based on the benchmark measurements. Unfortunately, the PR was not merged during the coding period due to a failing test for the macOS CI (based on the inner rounding problem for zero value). At the same time, Mixxx's macOS CI started crashing during the configuration stage because of an issue that the workflow runner has changed. For that reason, the bug fix couldn't be tested and the PR was not merged in time. After the bug fix will be able to test on macOS CI and will pass, this PR is ready for merging.
mixxx#10843 - RingDelayBufferTest: refactor includes and span creation
The tests for the
RingDelayBuffer are refactored and the span creations
mixxx#10858 - PitchShiftEffect: add description comments
Added the comments for the Pitch Shift effect processing.
website#279 - content/news: add GSoC 2022 Work Product
Adds a blog post containing the "Work Product" for Google Summer of Code 2022 on the Mixxx website.
Concretely for the Pitch Shift effect, the effect will be improved using the “pull model” implementation after the end of the GSoC period. With that, the Dry/Wet and Dry+Wet modes will be done for the effect too. As the next project extensions, based on the survey, the following options or features can be added to the Pitch Shift effect implementation:
- Auto-tune effect
- A piano keyboard interface
- Optimize interface for common controllers
- CPU load balancing
- Consider interaction with the main Pitch Shifter
- Expose compensation delay as additional parameter for making funny things without extra CPU cycles
More widely, as the future work for the Mixxx software, the wider support for the LV2 standard for effects can be implemented or better, the Carla audio plugin host can be introduced in the Mixxx application. It will allow users to use their favorite effects enabled via audio plugin standards such as LADSPA, DSSI, LV2, VST2, VST3 and so on, in the application instead of offering only the built-in effects or poor API for the LV2 standard. After consultation with my mentor, we agreed, that I can take on this task as a regular Mixxx contributor after the end of GSoC.
Things I learned from GSoC
I don't think I can even express how much the GSoC experience has given me. Even though I had the experience with open source in one small project, the workflow and development for the greater organization as Mixxx was completely different and gives me a lot. I have learned so many cool things about audio processing, development in C++ with the best practices, using the new C++20 standard, testing and benchmarking with the use of Google frameworks, and improving my knowledge with git and approaches to open source development in general. Thanks to the change in the proposed project extension, I learned a lot about real-time audio signal processing and about cool data structures. I had a chance to try to design the data structure with extensions too. I am really glad about the plan change now. I really improved my English, both, written and spoken. I think, that it was the best experience so far for me as a developer I ever had. It is awesome, that I can publish my work and have immediately the feedback and proposed improvements. With that, I liked the open source development to just how much I can learn from the awesome people and create new cool stuff. Despite I’m a college student, this actually missed me a lot, to just have feedback on my work which opens me the opportunities to learn. I really felt that I’m a part of the community. I will be happy to continue being part of the Mixxx organization and contributing to open source.
The wished new effect was implemented, and the issue which requested this new feature was closed with the "Fix Committed" status. All requirements by the Mixxx organization in their project idea, on which the project proposal was based, were met. Thanks to enough time in the GSoC Coding period was worked on the project extensions. Based on the situation and the importance of new Mixxx features, the originally proposed extension was replanned and changed. The new Group Delay handling structure was successfully implemented and optimized with the implementation of the extended data structure. In the GSoC Coding period, work was started to minimize the effects latency as well as to polish the effect even more. Unfortunately, that work could not be finished before the GSoC deadline.
During the coding period, I lined up among the top Mixxx contributors for the last month, with 61 commits authored, and I became the 27th of 238 contributors for the Mixxx software, with 119 commits in total.
August 2022 contributors - mixxxdj/mixxx
Contributor summary - mixxxdj/mixxx
First, I would like to many thank my mentor @Swiftb0y for his guidance, help, reviews, and a lot of new information and lessons he gave me during the summer. I'm just motivated and learned a lot. I would like to thank the Mixxx organization developer @Daniel Schürmann for his help, reviews and active contributions with new ideas and improvements to my project and pull requests. Thank you both for involving me in the Mixxx development process and for constructive criticism which offers me learn many new things in the past weeks. I would like to thank my summer colleague for the Mixxx organization and friend @Fatih Emre for his help and synergy on the final blog post structure and chapters. Of course, I would like to thank all Mixxx developers for welcoming me into the Mixxx family and for their help. I would like to continue our cooperation after GSoC end as Mixxx developers. I look forward to our future teamwork. Many thanks to the Google Summer of Code team they made this amazing experience possible for me.
Dan Farrant, A Guide To Semitones & Tones (Half & Whole Steps), Modified: 25 June 2022, Accessed: 2 Sept. 2022, Retrieved from: https://hellomusictheory.com/learn/semitones-tones/ ↩
Celine, Difference Between Wet and Dry Signals or Sounds, Modified: 22 Feb. 2012, Accessed: 3 Sept. 2022, Retrieved from: http://www.differencebetween.net/technology/difference-between-wet-and-dry-signals-or-sounds/ ↩
Audio Modeling, Grow Your Knowledge, Accessed: 3 Sept. 2022, Retrieved from: https://kb.audiomodeling.com/en/c/grow-your-knowledge/d/what-is-audio-latency-how-do-i-fix-latency-issues-while-recording/ ↩↩
NTi Audio, Latency in Audio Systems, Modified: 10 March 2021, Accessed: 7 Sept. 2022, Retrieved from: https://www.nti-audio.com/en/news/latency-in-audio-systems ↩↩
Adobe, Sample rates and audio sampling: a guide for beginners, Accessed: 8 Sept. 2022, Retrieved from: https://www.adobe.com/uk/creativecloud/video/discover/audio-sampling.html ↩