GetDunne Wiki

Notes from the desk of Shane Dunne, software development consultant

User Tools

Site Tools


sarah_oscillator_details

SARAH Oscillator Details

The oscillators are the only interesting aspect of SARAH's design. All of the other elements shown in the signal-flow diagram (Envelope Generators, LFOs, summing and scaling) are entirely conventional.

SARAH uses two oscillator instances per voice. The oscillators themselves are identically structured, but their settings are independent, i.e., they can be set to produce different waveforms, detuned, and mixed so as to provide at least a basic set of options to create composite timbres, and each oscillator's inherent “harmonic shaping” can also be set up differently, to provide further control. The following explanation (which is just an overview) applies equally to OSC1 and OSC2.

About SARAH's oscillators

SARAH's oscillators are essentially wave-table based. They simply play out samples from a 1024-element digitized representation of one cycle of the selected waveform—sine, triangle, square, or sawtooth. What is interesting and new is how the 1024-element wave tables are populated.

Sine waves are a special, simple case

There is a common sine wave table which is generated once and shared by all oscillator instances (including the LFOs in sine-wave mode). This is adequate, because the sine wave has no higher-order harmonics which need to be suppressed to avoid aliasing.

Reconstructing band-limited triangle, square and sawtooth waves

The triangle, square, and sawtooth wave tables are generated dynamically, as follows:

  1. 1024 samples of each waveform (one cycle) are generated once, using exactly the same mathematical expressions used in the LFOs, resulting in mathematically “exact” waveforms having 512 harmonics.
  2. Each mathematically-exact waveform is transformed using juce::dsp::FFT to produce a frequency-domain representation, which is a new 1024-element array, where each element (“coefficient”) represents the relative amplitude and phase of all 512 harmonics. (The mathematics of the FFT are such that each element is a complex number having real and imaginary components, and each harmonic other than the 0th and 512th is represented twice, for positive and negative frequencies.)
  3. After the initial forward FFT operations, one copy of each of the resulting three complex, frequency-domain arrays (one each for triangle, square, and sawtooth waveforms) is kept in memory, shared by all oscillator instances.
  4. Each oscillator instance has its own 1024-element complex array. In preparation to sound a note, it copies the coefficient data out from the appropriate common frequency-domain table to its own array, then performs an in-place inverse FFT to re-create the appropriate time-domain wave table.
  5. To sound a note, the oscillator resamples its own 1024-element array (wave table).

Avoiding aliasing by reconstructing band-limited wave tables

The interesting stuff happens at step 4, but to understand it we must first discuss step 5: When the oscillator is assigned a note frequency, the note's frequency in cycles per second (Hertz) is divided by the sampling rate in use (typically 44100 Hz) to yield a float-valued “phase increment” in samples per cycle. The oscillator also has a float-valued “phase” variable, restricted to the range 0.0 to 1.0, where 0.0 represents the beginning of the cycle and 1.0 represents the end. Each time the oscillator generates a new sample, it multiplies the phase by 1024 and rounds the result, to obtain a wave-table index in the range 0-1023, plucks that sample out of its wave-table and outputs it, then adds the phase increment to the phase, ensuring the result “wraps around” if necessary, so it remains in the range 0.0 to 1.0.

When the phase-increment is exactly 1.0, the oscillator replays its wave-table exactly. At a sampling rate of 44100 Hz, this would happen at an oscillator frequency of 44100/1024 = 43.066 Hz. This is a bit below F1, way down in the bottom octave of the piano range. Even lower notes result in a phase-increment a bit lower than 1.0, in which case, wave-table samples are occasionally repeated, resulting in slight quantization artifacts which are not very noticeable at such low notes.

For all notes above F1—i.e., just about every note you'll ever play—the phase increment will be greater than 1.0, meaning that some wave-table samples will be skipped. Basically, you are trying to replay the basic 43 Hz note at the higher pitch, and so all harmonics of the original tone will be multiplied by the phase increment value. The 512th harmonic of 43.066 Hz is 43.066 x 512 = 22049.79 Hz, just below half the 44100 Hz sampling rate—the so-called Nyquist frequency. Playing F2 means a phase-increment of around 2.0, so all harmonics above the 256th would be above the Nyquist frequency and would be “aliased” to lower frequencies. Up near the top of the piano range, the aliasing of even very low-numbered harmonics (which have substantial energy) will result in very noticeable aliasing artifacts.

In SARAH, this is avoided at step 4 above, by figuring out the highest-numbered harmonic which will still be less than the Nyquist frequency, and setting all higher-numbered harmonic coefficients to zero. When the resulting table is inverse-FFT-transformed, we obtain a version of the original waveform which is almost perfectly band-limited, and plays back at the new rate with no aliasing whatsoever.

Format of frequency-domain coefficient tables

In order to write the code for step 4, it's critical to understand the format of data produced by juce::dsp::FFT after a forward FFT operation. This is not published, and it's possible that some FFT implementations may differ (the JUCE code is structured so as to support multiple implementations in future), but I wrote my initial code based on my experience working with other FFT code, and my guess turned out to be correct.

I'm not going to give a full explanation of discrete Fourier transforms and the FFT here. Refer to Wikipedia or (preferably) any good Digital Signal Processing textbook to learn more. The following description concentrates entirely on pragmatic details relevant to C++ coding.

juce::dsp::FFT requires arrays that are exactly a power of 2 in size. The specific power is referred to as the order of the FFT, and must be specified when declaring a juce::dsp::FFT object. In SARAH, we use order 10, implying arrays with 2^10 = 1024 data points. The Fourier Transform is defined only for Complex Numbers, which are ordered pairs (re, im) consisting of a real and an imaginary component. These are represented in C++ as something like

struct Complex { float re; float im; };

such that sizeof(Complex) is two times sizeof(float). However, juce::dsp::FFT itself is declared so that all array arguments are treated as float[], and the documentation states that all arrays must be twice the size given by the order, e.g., float[2048] for order 10. This is because, internally, each float[2048] array is interpreted as a Complex[1024] array.

After a forward FFT operation, the frequency-domain data are again in a Complex[1024] array, arranged as follows:

  • Element 0 represents (gives the coefficient of) the 0th harmonic, aka DC component
  • Element 1 represents the 1st harmonic, aka fundamental
  • Element 2 represents the 2nd harmonic (1 octave above fundamental)
  • Element 3 represents the 3rd harmonic (1 octave plus a fifth above fundamental)
  • Element 511 represents the 511th harmonic
  • Element 512 represents the 512th harmonic
  • Element 513 represents the 511th harmonic
  • Element 1022 represents the 2nd harmonic
  • Element 1023 represents the 1st harmonic (fundamental)

Why are the 1st through 511th harmonics represented twice? This is because the Fourier Transform works in terms of both positive and negative frequencies, where a negative frequency is interpreted as the Nyquist frequency minus the absolute frequency value. (This is related to the phenomenon of aliasing. When a positive frequency is beyond the Nyqyist frequency, it is “aliased” to the corresponding negative frequency. The human ear hears this negative frequency just the same as the corresponding positive one.) The 0th harmonic, whose frequency is zero Hz, is its own negative. Owing to the arcane mathematics of the FFT, so is the 512th harmonic.

The resulting code

The result of the above analysis—which mercifully seems to be correct for the default juce::dsp::FFT implementations on both Windows and Macintosh—is that the code for reconstructing an anti-aliased and harmonic-shaped time-domain wave table is as follows:

void SynthOscillator::setFilterCutoffDelta(float fcDelta)
{
    if (waveform.isSine()) return;
 
    zeromem(waveTable, sizeof(waveTable));
    int maxHarmonicToRetain = int(1.0 / (2.0 * phaseDelta));
    if (maxHarmonicToRetain >= fftSize / 2) maxHarmonicToRetain = fftSize / 2;
 
    float cutoffHarmonic = (filterCutoff + fcDelta) * fftSize / 2;
 
    for (int ip = 1; ip < maxHarmonicToRetain; ip++)
    {
        int in = fftSize - ip;
        float* fftBuffer = SynthOscillatorBase::getFourierTable(waveform);
        memcpy(waveTable + ip, fftBuffer + ip, 2 * sizeof(float)); // positive frequency coefficient
        memcpy(waveTable + in, fftBuffer + in, 2 * sizeof(float)); // negative frequency coefficient
 
        float fv = 1.0f;
        if (ip > int(cutoffHarmonic)) fv = std::pow(ip / cutoffHarmonic, -filterSlope / 6.0f);
        waveTable[ip + 0] *= fv;
        waveTable[ip + 1] *= fv;
        waveTable[in + 0] *= fv;
        waveTable[in + 1] *= fv;
    }
    inverseFFT->performRealOnlyInverseTransform(waveTable);
}

This code works as follows:

  • If the selected waveform is sine, nothing need be done; the common sine wave table will be used instead of this oscillator's waveTable[] array.
  • waveTable[] is cleared to all zeroes, and we compute the index of the maximum harmonic to retain, as the reciprocal of twice the phase delta (oscillator frequency in cycles per sample)
  • We then compute cutoffHarmonic which represents the cutoff frequency of our simulated low-pass filter. This is based on the fixed filterCutoff parameter and a dynamic fcDelta which is the sum of the shape-LFO and scaled shape-EG outputs. (Both these quantities have the range 0.0 to 1.0.)
  • The for loop runs through the harmonic indices 1 through maxHarmonicToRetain, copies both the positive and negative-frequency coefficients out of the master table (a pointer to which is returned by SynthOscillatorBase::getFourierTable(waveform), and then scales the resulting copies by a factor fv computed by an expression which models the response curve of a simple low-pass filter with a given cutoff frequency (expressed in cycles per sample) and slope (expressed in dB per octave).
  • Finally, we request an inverse FFT on the waveTable[] array.

The astute reader will have noted that the for loop skips the 0th harmonic. The 0th harmonic of an digitized AC signal represents the DC component (net offset from zero) which is neither audible nor desirable in a digital signal processing system. Ensuring that the 0th-harmonic coefficient is always zero results in output waveforms which are guaranteed to be symmetric about zero—yet another free gift from the FFT.

"Real-Only" FFT

As I said earlier, the Fourier Transform is defined over the Complex domain. An ordinary sequence of time-domain samples is “real-only”—corresponding to Complex numbers having zero imaginary components. After a forward FFT operation, the Complex frequency components will be such that for each positive-harmonic coefficient (re, im), the corresponding negative-harmonic coefficient will be (re, -im); they are so-called Complex conjugates. These special properties can be utilized to define “real-only” variants of the forward and inverse FFT algorithms which use less CPU than the “full Complex” version, and the juce::dsp::FFT module includes these.

Any CPU efficiency is a bonus, but for practical purposes, what you need to know is that the “real-only” FFTs interpret the time-domain data not as an array of 1024 Complex values (a real float followed by the corresponding imaginary float), but rather as an array of 1024 real floats followed by an array of 1024 imaginary floats (which are all zeros). Hence, even though the array is declared as size 2048, we simply ignore the second half, and use only the first 1024 floats. SynthOscillator::getSample() is simply

float SynthOscillator::getSample()
{
    int sampleIndex = int(phase * fftSize + 0.5f);
    if (sampleIndex >= fftSize) sampleIndex = 0;
 
    phase += phaseDelta;
    while (phase > 1.0) phase -= 1.0;
 
    if (waveform.isSine()) return sineTable[sampleIndex];
    else return waveTable[sampleIndex];
}
sarah_oscillator_details.txt · Last modified: 2017/09/05 23:42 by shane