This was originally written 2021-02-05 as an RTF file, but I decided to redo it using DokuWiki.
I define an Audio Kit as a collection of software resources which allow novice and intermediate programmers to produce audio programs using a high-level language, without the need to write real-time DSP code, using the “conductor” principle. Let’s break this down as five parts.
This refers very specifically to a high-level language programming framework, supported by additional resource such as:
This term is intended to encompass applications and plug-ins, for mobile and/or desktop platforms., where the primary emphasis is on interactive programs which generate and/or process audio in real time and can connect to audio-related I/O devices, including MIDI systems. Some audio kits may also include resources to create non-real-time programs, e.g., programs to generate and/or play audio files.
AudioKit is the canonical “audio kit”. It is based entirely on the use of Swift, a modern programming language which is not suitable for real-time DSP development, but excels at the non-DSP aspects of audio program development, which fall into 3 main categories:
The interface category deserves further explanation. It encompasses everything required for the audio program (which may be a plug-in) to connect to, and interoperate with, related software (e.g. a DAW) in support of real-time, interactive audio and GUI functions.
This is any code which processes audio and related data (e.g. MIDI) with real-time responsiveness. No audio kit should require custom DSP coding, though some may support it, to some degree.
The most important aspect of an audio kit is its ability to serve as a scripting system for audio programs. I refer to this as the conductor principle, because it is embodied perfectly in the “conductor” portion of user-written code in AudioKit, Csound, etc.
As I said earlier, AudioKit is the canonical audio kit. It meets all five of the conditions listed above. JUCE and Csound don’t qualify as audio kits, for two main reasons:
SynthEdit and Flowstone are very powerful audio development systems, but are not audio kits because:
I could go on, but there is little point. Essentially every “high level” audio software development system ever devised either falls into the text-based Csound camp or the graphical Max/Pure Data camp, or is a hybrid of the two (cf. Cabbage, a graphical front-end to Csound).
See https://en.wikipedia.org/wiki/Comparison_of_audio_synthesis_environments for an excellent overview of software audio synthesis environments, and if you’re interested, check each against the five conditions to see which might qualify as an audio kit.
The ability to write code once and deploy it on multiple platforms (e.g. Macintosh, Windows, Linux, iOS, Android, RasPi, other embedded hardware, etc.), and/or with support for multiple interface standards (e.g. VST/VST3, Audio Units v2/v3, LV2, network protocols, etc.), however highly desirable and practical, is not a requirement for a programming system to be called an audio kit.
What I’m calling the “conductor principle” is the notion that a program written in a high-level language like Swift can script the construction of composite structures in a DSP library, which then process audio autonomously on a separate thread, and at the same time present a control/parameters API through which the high-level program can interact with them in real-time (without threading issues).
In AudioKit, the DSP library is the collection of “AK…” object classes, and everything else is based on the Audio Units mechanisms provided by Apple operating systems (Core Audio).
This approach has two huge problems:
Re #2: The Audio Units technology was designed around the needs of a DAW, whose plug-ins are complete audio processors (generators, instruments, audio effects, and MIDI effects), that are usually joined in very simple linear chains. This “coarse grained” approach is not suitable for important cases such as:
Because of this limitation, in AudioKit SynthOne it was necessary to pull all such dynamic functionality into a single Audio Unit, just as in conventional DAW plug-ins. The result is regrettable for two reasons:
When I needed to create a replacement for the original AKSampler (which was a wrapper around the buggy AUSampler), I had to do the same thing—I created an entire sample-based polysynth, and wrapped it as a single AU so it could be accessed from Swift code.
Key AudioKit Pro branded apps based around the new AKSampler–Digital D1, FM Player 2 and others–presented substantial programming challenges, as programmers tried using Swift code to compensate for key features (such as LFOs) which weren’t included in the original DSP implementation.
Later, I tried to create a collection of C++ based synth building-block classes (e.g. oscillators, dynamic voice management) in a now-defunct Core Synth branch of the AudioKit source tree. Although these components worked, I consider this a failed approach, for three reasons:
I am now thinking that the best way around these issues will be to add a new, dynamic, scriptable DSP subsystem to AudioKit:
This is nothing more than wishful thinking right now. I don’t yet have any specific proposals for how it might be architected/implemented, and I would expect a lot of careful research and experimentation will be needed before a workable design could be devised. I think it’s worth doing.
I could go on and on, but I’ll restrain myself. I’ve hardly said anything about the importance of supporting standard interface technologies such as VST/VST3/AU/AUv3, and I haven’t talked about how nothing in the proposed new approach is at all specific to Swift, so it’s straightforward to imagine adding bindings to other high-level languages, thus extending the audio kit concept to non-Apple platforms.
I think it’s ALL worth doing.