Dear Amiga mod playback software author,

you are receiving this note because your software is implementing playback
capabilities for an important format, the Amiga protracker 1.x-3.x mod format.

As you probably are aware, most of these modules were composed on the Amiga
computer for playback on Amiga system. The Amiga hardware playback system is
unlike the PC environment, and needs some special care if authentic-seeming
Amiga mod playback is desired.

Typical playback software for the various sample tracker formats is architected
roughly like this:

    [ Player routine ] --> [ Mixer and resampler ] --> Output

The player routine component produces information about the samples, their
sampling frequencies and volume levels (and filtering, panning, etc. depending
on format) to the mixer and resampler component, which uses the player
routine's instructions to generate the final output by resampling the sample
stream from memory and mixing it to physical output channels.  I'm going to
focus exclusively to the Mixer and resampler component here.

A typical implementation of the resampler uses linear interpolation, or some
higher-order polynomial interpolation, splines, or even sinc-based
"theoretically correct" resampling with some 8-tap FIR or more. Unfortunately,
all of them miss the mark as far as Amiga is concerned. For typical mod player,
the most accurate emulation of Amiga output occurs when *no* interpolation is
used at all. However, there seems to be nothing special about the mixing --
simply adding the resampled channels together suffices.

Why the Amiga is different?
---------------------------

To understand what goes on with the Amiga sound chip, Paula, is to understand that
Paula does no interpolation of any kind. Paula's output is strictly a pulse wave,
produced on 3546895 Hz frequency, which is the Paula clock rate for PAL systems.

The Amiga period values, typically valued somewhere between 120 and 800, count Paula
ticks. For instance, a period value of 400 means that Paula waits 400 ticks holding
the output constant, and then changes to the next sample.

To simulate this perfectly, we would produce an output sample stream at that
3.55 MHz rate, one sample per tick, and then simply hold the output constant
while counting down to zero from the period value, then switch to next sample,
reset period clock back to original value, and so on.  This stream would be
very accurate representation of the Paula's output.

Aliasing in Paula output
------------------------

It is also worth it to understand that playing back a sample from Paula
replaces the original waveform with its smoothly undulating shapes with nothing
but hard-edged pulses. This will have an impact to the sound, of course, and is
called aliasing distortion. It's called aliasing, because copies of the
original sound's frequency spectrum repeat throughout the entire audible part
of audio spectrum and beyond, theoretically up to infinity.

This may surprise those who believe that because Amiga's maximum sampling rate
is some 28867 Hz, the maximum frequency produced by Amiga is threrefore limited
to 14.5 kHz or so. However, this is merely the highest frequency that may be
produced without aliasing.

In addition, some misguided people have tried to run their mod players at 28867
Hz mixing frequency based on some kind of confused idea of what the mixing rate
means for Amiga. This won't work either.

The analog filters -- more complications
----------------------------------------

After leaving the chip, the sound enters a 6 dB/oct resistor+capacitor (RC)
low-pass filter tuned at approximately 5 kHz. I will call this component "fixed
filter" in this document. The purpose of this filter is probably to remove some
of that aliasing which is inherent in the pulse synthesis. Additionally, it is
possible to engage a low-pass 12 dB/oct Butterworth filter tuned at
approximately 3.2 kHz by turning the Amiga power LED brighter with a special
protracker command.  I will call this component "LED filter".

These filters should probably be understood as reconstruction filters, because
their purpose is to attempt to "reconstruct" the original smoothly undulating
waveform after a pulse-based version of it has been generated by some D/A
converter.  However, proper reconstruction filters should probably feature
steeper cut-offs than just 6 or 12 dB, and they should be dynamic; their tuning
should follow the sampling rate, as it is the sampling rate that decides where
the first aliased frequency will occur. (As an example, some Atari ST models
have a DAC chip which is fed through DMA at any of the four possible sampling
rates, and the analog reconstruction filter's cutoff also is chosen based on
the sampling rate.)

Regardless, this is the hand we have been dealt with, and in order to
accurately simulate Amiga 500 audio playback, we thus need to emulate these two
filters in addition of producing the right kind of output stream from Paula
emulation.

No interpolation -- use resampling!
-----------------------------------

To properly resample audio, we need to achieve two things, often done simultaneously:

* remove all components in the audio stream above some selected frequency, usually
  about 20 kHz.
* decimate the signal by discarding samples until we obtain output at the correct rate.
  For 44.1 kHz output we need to discard approximately 79 of every 80 samples.

Decimation can be safely done only after the lowpass step has been completed.
If decimation is done without lowpass filtering, some extra aliasing shall be
introduced to the remaining audio. What happens is that all frequencies in the
spectrum now alias in the audible range, making the ultrasonic sounds produced by
the pulse-wave synthesis to appear as additional noise in the mod playback.

Fortunately, the previous paragraph contains the answer to the problem. It is
possible to make simulated Paula's output consist of "bandlimited steps", or
BLEPs as they are called. The output is done by simulating Paula at 3.55 MHz
clock rate and replacing the hard edges of Paula's pulses by BLEPs, which are
carefully constructed in such a fashion to not contain those difficult frequencies
above 21 kHz or so.

Because the input stream does not contain frequencies near or above nyquist
frequency of the final sampling rate, all we have to do is to build an abstract
version of Paula's output signal that we can evalute to real sample whenever it
is needed. The abstract version consists two parts: the output level of
"genuine" Paula with hard-edged pulses, and the states of various BLEPs which
which are then used to correct the "hard edges" of Paula's waveform to the
smooth curves of the BLEPs instead.

The mixing of one blep thus has the following procedure, roughly:

* adjust the output level of Paula to the new value.
* start a new blep, and scale it in such a fashion that its amplitude is the difference
  between new output level and the old level.

If we ask the synthesis function to produce a sample at this moment, it would still
report the old level. But as we clock Paula forward, and ask again, the blep gradually
changes the output level from the old value to the new value. At some point we are done
with the BLEP, and then the output is simply constant at the new output level.

This leaves the filters, but they are easy. Because the output is done by
mixing bleps, we can implement the analog filters in the bleps themselves, by
filtering them with digital models of the analog filters to produce versions of
BLEPs that have treble components reduced to match Amiga 500's operation with
LED filter on and off. This may seem somewhat miraculous, but this is how it
works.

The implementation
------------------

We need to compute the blep tables. Firstly we need a function that contains
all the frequencies in the world. One such function is called the delta
function. We define it as follows:

    delta(x) = 0 if x != 0
    delta(x) = 1 if x = 0.

If we low-pass filter the delta, it morphs into a sinc function. sinc function
is a special one, it has all the frequencies from 0 to some given cutoff value.
The cutoff is determined by the rate the sinc is oscillating in.

    f(x, rate) = sin(x * pi / rate) / (x * pi / rate) if x != 0
    f(0, rate) = 1 if x = 0

With rate = 1.0 the sinc does no filtering at all, and becomes a delta function
in effect. If we plot the sinc function by asking the value of sinc at each
integer value, it is zero everywhere except 1 at 0. To do no filtering at all
is the same as having the cutoff at exactly half of the sampling rate. If the
sinc is to remove everything above, say, 21 kHz, rate = 3.55e6 / 21e3 / 2.

An ideal sinc filter extends from negative infinity to the positive infinity.
Because a practical implementation is needed, the sinc function must be
truncated at some point.  The truncation affects the width of the stopband of
the sinc function -- short functions begin filtering earlier and are not as
steeply filtering as long filters.  I've found that 2048 point sinc functions
are sufficient to perform filtering for Paula, so we'd sample the sinc function
from f(-1024, rate) to f(1023, rate) and thus yield 2048 values.

Because 2048 point function is quite short, there is an issue at the edges,
where the sinc is still oscillating. Some windowing function is needed to
smooth the sinc properly at edges to zero. The Hann window, or the raised
cosine window is the simplest. It is defined as hann(x, length) = 0.5 - 0.5 *
cos(x / length * 2 * pi). For this, the length would be 2048, and the values of
x would go from 0 to 2047.

The result should look like a symmetric "water-drop" with few small side lobes.
If you have some plotting software, you can see the sinc function at this
point. gnuplot suffices, you can type something like:

set xrange [-1024:1023]
plot sin(x * pi / 84) / (x * pi / 84) * (0.5 - 0.5 * cos((x + 1024) / 2048 * 2 * pi))

The sinc table is final mixing frequency independent
-----------------------------------------------------

The only things that go into calculating the sinc table are the Paula clock
frequency and the frequency of the cutoff. As long as calls to obtaining output
occur at rates greater or equal to 44.1 kHz, the table works. With rates below
about 20 kHz, the blep table itself will produce aliasing.

The function is sampled at Paula clock rate, which is constant; it therefore
also has a fixed duration in seconds. For 2048 points, it's 2048 paula ticks or
roughly 0.6 milliseconds. From this it follows that at 44.1 kHz sampling
frequency, about 25 samples of the table make it into the output.

The table must however be longer than merely 25 samples because depending on
the phase where the blep starts, ie. paula pulse occurs, it's always different
values of the table that need to be chosen, so all of them become used at some
point. (However, in truth the table does not need to be that precise, it could
well be sampled at a lower frequency without compromising output quality
noticeably. Perhaps only about 16 different phase values are needed, but now it
has 80 (as many phases as there are clocks between samples).)

Filtering the sinc
------------------

The minimum-phase reconstruction is implemented in cepstral domain in the
included program.  My knowledge of the operations involved is too limited to
explain why or how it works, but after we do this, the filter shape changes in
such a way that as much as possible of the filter occurs "early on" and the
filter "settles" for longer period after an initial large transition. So,
before minimum phase reconstruction, the filter is a symmetric waterdrop-like
wave, with pre- and postringing at the peak, but after the reconstruction the
peak moves to the start of filter, and the filter displays decaying oscillation
for most of its duration.  The upshot of the reconstruction is that it allows
our IIR filters the longest period to settle after the initial excitation by
the large part of the peak, thus allowing us to truncate them earlier.

There has been some talk that minimum-phase reconstructions are in general
better audio filters than linear-phase filters (which are symmetric by the
midpoint), because they do not display the preringing that is perceptible to
ear if it occurs for too long. The increased postringing is not an issue, as it
often occurs in nature, and thus the ear "expects" it.  However, the issue is
made complicated by the fact that phase relationships are by definition
modified by minimum-phase reconstructions. At any case, this choice makes no
audible difference in this case, because in both cases the ringing is tuned to
occur at frequencies above 20 kHz, and the phase delays are almost linear
across the entire audible band.

The fixed filter
----------------

We should apply the IIR filters to the sinc function before proceeding to build the
band-limited step.  The 6 dB/oct filter is an IIR filter that is computed
according to the following formula:

    output(n) = b0 * input(n) + (1 - b0) * output(n - 1)

that is, the output is a linear combination of input and the value of output in the
last iteration.

The b0 determines the cutoff value and is computed as follows:

    omega = 2 * pi * 5000 / sampling_rate
    b0 = 1 / (1 + 1 / omega)

As a reminder, the cutoff frequency would be 5.0 kHz and sampling rate is 3.55
MHz because we will operate the blep that is "sampled" at 3.55 MHz. The rest of
the problem here is just applying the previous formula to filtering the BLEP.

Amiga 1200 current leakage simulation
-------------------------------------

It seems that Amiga 1200 output response attenuated by some 0.2 - 0.3 dB near
20 kHz.  This would seem to be due to the LED filter even though it isn't
enabled, because this effect does not manifest when turning the LED filter on.
It is difficult to be sure, though, because of limited resolution and the
extreme attenuation, noise floor gets very close to signal. Regardless, the
effect can be modelled with a 32 kHz lowpass RC filter.

A word of caution
-----------------

To correctly apply the filter, we first need to extend the sinc we computed by
imagining that it is preceded by, say, 10 000 constant values of the first
value of the filter, and these are run through the filter to ensure that the
filter is properly settled for the base level of the blep. This is especially
important if the first value of the sinc is not zero, or if the filter is
reused between computations for any reason.

Additionally, to validate that the length of the blep is sufficient, it would
be helpful to examine how much the IIR filter oscillates at the end of the
table by computing the value of

    filter(last_output_value) - last_output_value

If the filter has not yet damped, the difference will not be zero, and depending of
the oscillation magnitude, artifacts could be introduced.

The LED filter
--------------

If no LED filtering emulation is desired, the next BLEP can be left uncomputed.
To simulate the LED filter, we clone the previous sinc function for applying
the LED filter on.  The model for LED filter requires application of a biquad
filter that simulates the 2nd order Butterwoth lowpass filter.

The formula for biquad is as follows:

    output(n) = b0 * input(n) + b1 * input(n-1) + b2 * input(n-2)
        - a1 * output(n - 1) - a2 * output(n - 2)

That is, two samples of past input history and two samples of past output history and
the current sample are required. The coefficients are derived using bilinear transform
in the Python source; feel free to look at the gory details for computing b0, b1, b2,
a1 and a2 from there.

So, anyway, we should go and filter our copy of the previous sinc function
with the biquad formula, and then proceed to figuring out how to convert the
filtered sincs to bandlimited steps.

The step-function
-----------------

Step is our representation of Paula's pulses. We could define it as follows:

    f(x) = 0 if x < 0
    f(x) = 1 if x >= 0

The band-limited step
---------------------

Convolution in DSP is the application of a filter in the time domain. It is
basically like computing a dot product between two vectors. The other "vector"
is the tabularised version of the convolution function, and the other is the
sample data around the point we wish to compute convolution at. We would be
talking about 2048-dimensional vector here, if vectors were otherwise an
appropriate metaphor.

In order to bandlimit a step, we can calculate a representation of the step
sample stream.  We start with the entire sample buffer at zero and calculate
the convolution. It is all zero as expected.

Then we introduce the edge, or the first sample of the step into the sample
buffer. The convolution's value is now equal to the first value in the
convolution function. As more samples are introduced, the value of convolution
is the sum of the values in the convolution function over the area of the step
function. Thus, the bandlimited step is in effect just a numeric integral of
the convolution function.

We should take care to scale the step accordingly. If you are dealing with
16-bit sample data, you would probably want to scale the step in such a fashion
that it starts at, say, 0 and ends at 65536. Or, as I often prefer, it starts
at -65536 and ends at 0. This is just to make the blep evaluation code simple.
The downside with starting with negative number is that the sign takes 1 byte
to encode in the resultant C source. For the example program, the blep
resolution is 17 bits and the blep runs from a positive value to zero.

Sample code in C (but for mono output)
--------------------------------------

/* 131072 to 0, 2048 entries */
#define BLEP_SCALE 17
#define BLEP_SIZE 2048

/* the step function */
int32_t sine_integral[BLEP_SIZE] = { precomputed table };

/* the instantenous value of Paula output */
static int16_t global_output_level;

/* count of simultaneous bleps to keep track of */
static unsigned int active_bleps;

/* the structure that holds data of bleps */
typedef struct {
    int16_t level;
    int16_t age;
} blep_state_t;

/* place to keep our bleps in. MAX_BLEPS should be
   defined as a BLEP_SIZE / MINIMUM_EVENT_INTERVAL.
   For Paula, minimum event interval could be even 1, but it makes sense to
   limit it to some higher value such as 16. */
static blep_state_t    blepstate[MAX_BLEPS];

/* return output simulated as series of bleps */
int16_t
output_sample(int filter_table) {
    int i;
    int32_t output;

    output = global_output_level << BLEP_SCALE;
    for (i = 0; i < active_bleps; i += 1)
        output -= winsinc_integral[filter_table][blepstate[i].age] * blepstate[i].level;
    output >>= BLEP_SCALE;
    
    if (output < -32768)
        output = -32768;
    else if (output > 32767)
        output = 32767;
    
    return output;
}

In the code where new Paula samples get added to the system, you would so
something like this:

function input_sample(int16_t sample) {
    if (sample != global_output_level) {
        /* Start a new blep: level is the difference, age (or phase) is 0 clocks. */
        if (active_bleps > MAX_BLEPS - 1) {
            fprintf(stderr, "warning: active blep list truncated!\n");
            active_bleps = MAX_BLEPS - 1;
        }
        /* Make room for new blep */
        memmove(&blepstate[1], &blepstate[0], sizeof(blepstate[0]) * active_bleps);
        /* Update state to account for the new blep */
        active_bleps += 1;
        blepstate[0].age = 0;
        blepstate[0].level = sample - global_output_level;
        global_output_level = sample;
    }
}

And while no events occur, you would call the clock() function to advance the bleps.

void clock(unsigned int cycles) {
    for (i = 0; i < active_bleps; i += 1) {
        blepstate[i].age += cycles;
        if (blepstate[i].age >= BLEP_SIZE) {
            active_bleps = i;
            break;
        }
    }
}

This code is not very optimized in that it constantly keeps blepstate as a
sorted list. It is possible to implement it more efficiently, for instance by
keeping occurence times instead of ages, but that complicates the output sample
function.  The right optimization of the code might depend on whether periods
are small or large compared to sampling intervals. For Paula, periods are
typically much larger than sampling intervals, so code like this might be
better tradeoff. For modern computers, it takes quite little time, so I leave
the optimizing for people who are that way inclined. Typical length of the blep
list can be short, only 5-6 entries if the sample rates are low or occur
conveniently otherwise. At the maximum, Paula is constantly producing pulse
waves but it's probably sensible to limit maximum period to 124 unless Paula
DMA fetch starvation is simulated.

A typical sequence of events would look like this. Assuming that we sample output
at exactly 80 paula clock ticks apart, and that playback period is, say 140, we'd do
calls like this:

    input_sample(samplebuf[0]);
    clock(80);
    *output++ = output_sample();
    clock(60);
    input_sample(samplebuf[1]);
    clock(20);
    *output++ = output_sample();
    clock(80);
    *output++ = output_sample();
    clock(40);
    input_Sample(samplebuf[2]);
    ...

Results and analysis
--------------------

I have hi-fi recordings of Amiga 500 and 1200 playing Sainahi Circles with both
LED filter on and off. This analysis was largerly based on the sampling work by
pyksy on #amigaexotic at IRCnet. Big thanks to pyksy for his work! In addition,
Melkor (kohina board) sent me a hi-fi sampling of his Amiga 4000 playing Sainahi
Circles.

The results are summarised as number of png pictures that display the simulated
and real responses side-by-side and their residual (difference) spectrums. The
units are in Hz and dB.  There is a constant offset that follows from the fact
that the hi-fi samplings used 32-bit floating point samples, whose value range
is different from the integer-coded PCM data, yielding the largest part of the
difference. The rest is explained by the different recording levels used
between setups.

The files called a{1200,500,4000}_{on,off}.png shows the simulated response of
UADE and real Amiga superimposed. comb{1200,500,4000}_{on,off}.png show the
difference between two in dB units. These are the same pictures as before, but
with the spectra substracted, allowing for more precise evaluation of the
differences between simulated and real output.

We can see that UADE and Amiga produce same features closely, thus essentially
validating the model used for output synthesis.

We can observe a good cutoff starting around 20 kHz and quickly extinguishing
response by approximately 23 kHz. This should yield satisfactory results for
all sampling frequencies above and including 44.1 kHz. However, I personally
recommend 48 kHz as default output frequency because it works without
resampling on majority of cards, while 44.1 kHz output often may need a
resampling step on either software or hardware.

For LED off cases, the residual spectrum is largerly just noise. There is,
however a small 0.3 dB boost around 0 to 3 kHz. This can be seen on Amiga 500,
but not on Amiga 1200. In both Amiga 500 and Amiga 1200, the LED filter also
displayed this boost, and it was compensated for by adjusting the Butterworth
filter with a small resonance term.

Despite adjustments, the LED filter case is not perfect. The residual spectrum
is roughly flat to 3 kHz, thanks to manually fitting the filter, but after that
there is some extra attenuation which is then compensated by something else on
the analog filter.  For Amiga 500 case, the simulation hits noise floor and can
not yield anything meaningful above 15 kHz, but those measurements also show
the same general pattern.

Overall, I think the model yields accuracy to about 0.5 dB for all cases, which
should be sufficient for all practical purposes. Especially the Amiga 1200 case
seems well modelled.

Author
------

Antti S. Lankila <alankila@bel.fi>

feel free to send feedback, comments and spam.
