GetDunne Wiki

Notes from the desk of Shane Dunne, software development consultant

User Tools

Site Tools


on-audio-kits

On "Audio Kits" and the Future of AudioKit

This was originally written 2021-02-05 as an RTF file, but I decided to redo it using DokuWiki.

Concept of an Audio Kit

I define an Audio Kit as a collection of software resources which allow novice and intermediate programmers to produce audio programs using a high-level language, without the need to write real-time DSP code, using the “conductor” principle. Let’s break this down as five parts.

A collection of software resources

This refers very specifically to a high-level language programming framework, supported by additional resource such as:

  • Code libraries (with appropriate high-level language bindings) for DSP functionality, and possibly other things.
  • Software tools to support the programming methodology, such as template-based code generators.
  • Assets, such as GUI widgets, images, sound files or samples.

Audio programs

This term is intended to encompass applications and plug-ins, for mobile and/or desktop platforms., where the primary emphasis is on interactive programs which generate and/or process audio in real time and can connect to audio-related I/O devices, including MIDI systems. Some audio kits may also include resources to create non-real-time programs, e.g., programs to generate and/or play audio files.

High-level (programming) language

AudioKit is the canonical “audio kit”. It is based entirely on the use of Swift, a modern programming language which is not suitable for real-time DSP development, but excels at the non-DSP aspects of audio program development, which fall into 3 main categories:

  1. Interactive GUI coding
  2. File and data management (including audio preset management)
  3. Interface development

The interface category deserves further explanation. It encompasses everything required for the audio program (which may be a plug-in) to connect to, and interoperate with, related software (e.g. a DAW) in support of real-time, interactive audio and GUI functions.

Real-time DSP code

This is any code which processes audio and related data (e.g. MIDI) with real-time responsiveness. No audio kit should require custom DSP coding, though some may support it, to some degree.

The "conductor" principle

The most important aspect of an audio kit is its ability to serve as a scripting system for audio programs. I refer to this as the conductor principle, because it is embodied perfectly in the “conductor” portion of user-written code in AudioKit, Csound, etc.

What is, and is not, an audio kit?

As I said earlier, AudioKit is the canonical audio kit. It meets all five of the conditions listed above. JUCE and Csound don’t qualify as audio kits, for two main reasons:

  1. Their primary languages (C++ and C, respectively) only barely qualify as “high level”.
  2. JUCE does not support the conductor principle. Csound does, but is otherwise such an impoverished “collection of software resources” (especially for GUI development) that it is scarcely worth considering. It’s basically a glorified plug-in wrapper API (see below).

SynthEdit and Flowstone are very powerful audio development systems, but are not audio kits because:

  1. They support a “graphical programming” paradigm, which (having studied such things since the early 90s) I find to be a poor and ridiculously inefficient substitute for a true high-level programming language.
  2. Although both are arguably “graphical scripting systems”, I think it’s just too much of a stretch to say they support the conductor principle.

I could go on, but there is little point. Essentially every “high level” audio software development system ever devised either falls into the text-based Csound camp or the graphical Max/Pure Data camp, or is a hybrid of the two (cf. Cabbage, a graphical front-end to Csound).

See https://en.wikipedia.org/wiki/Comparison_of_audio_synthesis_environments for an excellent overview of software audio synthesis environments, and if you’re interested, check each against the five conditions to see which might qualify as an audio kit.

What about multi-platform targeting?

The ability to write code once and deploy it on multiple platforms (e.g. Macintosh, Windows, Linux, iOS, Android, RasPi, other embedded hardware, etc.), and/or with support for multiple interface standards (e.g. VST/VST3, Audio Units v2/v3, LV2, network protocols, etc.), however highly desirable and practical, is not a requirement for a programming system to be called an audio kit.

Expanding on the Conductor Principle in AudioKit

What I’m calling the “conductor principle” is the notion that a program written in a high-level language like Swift can script the construction of composite structures in a DSP library, which then process audio autonomously on a separate thread, and at the same time present a control/parameters API through which the high-level program can interact with them in real-time (without threading issues).

AudioKit architecture and its limitations

In AudioKit, the DSP library is the collection of “AK…” object classes, and everything else is based on the Audio Units mechanisms provided by Apple operating systems (Core Audio).

  • The audio unit (AU) is the only mechanism available to link GUI code to DSP code.
  • As a result, everything (all DSP code) must be packaged as an AU.
  • AUs can be connected easily in linear chains, and less easily in directed acyclic graphs (DAGs). This is the only available mechanism for scripted construction of DSP structures.

This approach has two huge problems:

  1. Wrapping DSP code (written in C or C++) as an AU requires a ton of boilerplate code.
  2. The AU is too large and too limited to be a basic unit of DSP code.

AudioKit fails to accommodate significant use cases

Re #2: The Audio Units technology was designed around the needs of a DAW, whose plug-ins are complete audio processors (generators, instruments, audio effects, and MIDI effects), that are usually joined in very simple linear chains. This “coarse grained” approach is not suitable for important cases such as:

  • Connecting two oscillators as an FM carrier/modulator pair
  • Dynamic voice allocation in a polyphonic instrument

AudioKit fails to accommodate significant "AudioKit apps"

Because of this limitation, in AudioKit SynthOne it was necessary to pull all such dynamic functionality into a single Audio Unit, just as in conventional DAW plug-ins. The result is regrettable for two reasons:

  1. SynthOne is thus not truly an AudioKit app. (It is, but it’s a degenerate case with only one AU.)
  2. DSP development for SynthOne was based entirely on Soundpipe objects, and the programmers were fully exposed to all the complexities of low-level DSP programming—heap management, thread-safety, performance concerns, and more.

When I needed to create a replacement for the original AKSampler (which was a wrapper around the buggy AUSampler), I had to do the same thing—I created an entire sample-based polysynth, and wrapped it as a single AU so it could be accessed from Swift code.

Key AudioKit Pro branded apps based around the new AKSampler–Digital D1, FM Player 2 and others–presented substantial programming challenges, as programmers tried using Swift code to compensate for key features (such as LFOs) which weren’t included in the original DSP implementation.

A failed experiment

Later, I tried to create a collection of C++ based synth building-block classes (e.g. oscillators, dynamic voice management) in a now-defunct Core Synth branch of the AudioKit source tree. Although these components worked, I consider this a failed approach, for three reasons:

  1. As with SynthOne, the programmer is exposed to the full hell of DSP programming.
  2. A multi-level mountain of boilerplate code was required to wrap the resulting DSP system as an AU for use in a Swift program.
  3. Most significant of all: the new C++ objects were not scriptable at the Swift level. Hence the whole approach simply sidestepped the central principle of an audio kit.

A better approach?

I am now thinking that the best way around these issues will be to add a new, dynamic, scriptable DSP subsystem to AudioKit:

  • Swift code will be able to script the way low-level DSP objects are connected, inside a single AU (e.g. a polysynth) which can be connected up with others (e.g. an effects chain) in the conventional AudioKit way. The basic architecture might be that of a modular synthesizer, as is done in “proto audio kit” environments like SynthEdit.
  • There will have to be some mechanism by which the parameters of low-level DSP objects can be exposed for access by Swift code, as parameters of the enclosing Audio Unit.

This is nothing more than wishful thinking right now. I don’t yet have any specific proposals for how it might be architected/implemented, and I would expect a lot of careful research and experimentation will be needed before a workable design could be devised. I think it’s worth doing.

This is only the beginning

I could go on and on, but I’ll restrain myself. I’ve hardly said anything about the importance of supporting standard interface technologies such as VST/VST3/AU/AUv3, and I haven’t talked about how nothing in the proposed new approach is at all specific to Swift, so it’s straightforward to imagine adding bindings to other high-level languages, thus extending the audio kit concept to non-Apple platforms.

I think it’s ALL worth doing.

on-audio-kits.txt · Last modified: 2021/02/07 17:16 by shane