Performance, and we are cooking.
A work stealing task schedular which supports a simple closure based submission system, lazy evaluation and recursive submission. 

This has also been posted on Sonic Field.

The schedular has gone though a lot of changes over the years; this new work steeling feature, I believe, really fixes a lot of the deficiencies in the previous design without adding and complexity to the user experience.

So let's start by an explanation of the schedular in general. This is a task based lazy schedular. We create a closure around a 'task' and then return a 'SuperFuture' for it. The thing about SuperFuture is that it does not do anything until it has to. Unlike some Future based task modules, this one does not compute stuff as it turns up; rather it computes stuff as it is needed. 

When I restarted work on this a few days again I realised I had forgotten how the previous, simpler version worked. To avoid this mistake again, I have very, very heavily commented the code. So, rather than duplicate everything, I leave the code and comments to explain the rest.

Note - this code is all AGPL 3 - please respect copyright.

# For Copyright and License see LICENSE.txt and COPYING.txt in the root directory
import threading
import time
from java.util.concurrent import Callable, Future, ConcurrentLinkedQueue, \
                                 ConcurrentHashMap, Executors, TimeUnit
from java.util.concurrent.locks import ReentrantLock

from java.lang import System
from java.lang import ThreadLocal
from java.lang import Thread
from java.util import Collections
from java.util.concurrent import TimeUnit
from java.util.concurrent.atomic import AtomicLong, AtomicBoolean

Work Stealing Lazy Task Scheduler By Dr Alexander J Turner

Basic Concepts:

- Tasks are closures. 
- The computation of the task is called 'execution'.
- They can be executed in any order* on any thread.
- The data they contain is immutable.
- Making a task able to be executed is called submission.
- Execution is lazy**.
- Tasks are executed via a pool of threads and optionally one 'main' thread.
- Ensuring the work of a task is complete and acquiring the result (if any)
  is called 'getting' the task.

Scheduling Overview:

* Any order, means that subitted tasks can be executed in any order though there
can be an order implicit in their dependencies.

IE taskA can depend on the result of taskB. Therefore.
- taskA submits taskA for execution.
- taskB submits taskC, taskD and taskE for execution.
- taskB now 'gets' the results of taskC, taskD and taskE

In the above case it does not matter which order taskC, taskD and taskE are

** Lazy execution means that executing a task is not done always at task 
submission time. Tasks are submitted if one of the following conditions is 
- the maximum permitted number of non executing tasks has been reached. See
  'the embrace of meh' below.
- the thread submitting the task 'gets' and of the tasks which have not yet
  been submitted.
- a thread would is in the process of 'getting' a task but would block waiting
  for the task to finish executing. In this results in 'work stealing' (see
  below) where any other pending tasks for any threads any be executed by the
  thread which would otherwise block.

Embrace of meh

Meh, as in a turm for not caring or giving up. This is a form of deadlock in
pooled future based systems where the deadlock is causes by a circular 
dependency involving the maximum number of executors in the pool rather than a
classic mutex deadlock. Consider this scenario:

- There are only two executors X and Y
- taskA executes on X
- taskA sumbits and then 'gets' taskB
- taskB executes on Y
- taskB submits and then 'gets' taskC
- taskC cannot execute because X and Y are blocked
- taskB cannot progress because it is waiting for taskC
- taskA cannot progress because it is waiting for taskB
- the executors cannt free to execute task as they are blocked by taskA and 
- Meh

The solution used here is a soft upper limit to the number of tasks submitted
to the pool of executors. When that upper limit is reached, new tasks are 
not submitted for asynchronous execution but are executed immediately by the
thread which submits them. This prevents exhaustion of the available executors
and therefore prevents the embrace of meh.

Exact computation of the number of running executors is non tricky with the
code used here (java thread pools etc). Therefore, the upper limit is 'soft'.
In other words, sometimes more executors are used than the specified limit. This
is not a large issue here because the OS scheduler simply time shares between 
the executors - which are in fact native threads.

The alternative of an unbounded upper limit to the number of executors in not
viable; every simple granular synthesis or parallel FFT work can easily 
exhaust the maximum number of available native threads on modern machines. Also,
current operating systems are not optimised for time sharing between huge
numbers of threads. Finally, there is a direct link between the number of
threads running and the amount of memory used. For all these reasons, a soft
limited thread pool with direct execution works well.

Work Stealing
- taskA threadX submits taskB
- taskA threadX gets taskB
- taskB threadY submits taskC and taskD
- taskB threadY gets taskC
- there are no more executors so taskC is executed on threadY
- at this point taskC is pending execution and taskA threadX is waiting for the
- result of taskB on threadY.
- threadX can the stop waiting for taskB and 'steal' taskC.

Note that tasks can be executed in any order on any thread.


# ===============================

# The maximum number of executors. Note that in general the system
# gets started by some 'main' thread which is then used for work as 
# well, so the total number of executors is often one more than this 
# number. Also note that this is a soft limit, synchronisation between
# submissions for execution is weak so it is possible for more executors
# to scheduled.
SF_MAX_CONCURRENT = int(System.getProperty("sython.threads"))

# Tracks the number of current queue but not executing tasks
SF_QUEUED         = AtomicLong()

# Tracks the number of executors which are sleeping because they are blocked
# and are not currently stealing work
SF_ASLEEP         = AtomicLong()

# Marks when the concurrent system came up to make logging more human readable
SF_STARTED        = System.currentTimeMillis()

# Causes scheduler operation to be logged
TRACE             = str(System.getProperty("sython.trace")).lower()=="true"

# A thread pool used for the executors 
SF_POOL    = Executors.newCachedThreadPool()

# A set of tasks which might be available for stealing. Use a concurrent set so
# that it shares information between threads in a stable and relatively 
# efficient way. Note that a task being in this set does not guarantee it is
# not being executed. A locking flag on the 'superFuture' task management
# objects disambiguates this to prevent double execution. 
SF_PENDING = Collections.newSetFromMap(ConcurrentHashMap(SF_MAX_CONCURRENT*128,0.75,SF_MAX_CONCURRENT))

# =========

# Define the logger method as more than pass only is tracing is turned on
    # Force 'nice' interleaving when logging from multiple threads
    print "Thread\tQueue\tAsleep\tTime\tMessage..."
    def cLog(*args):
        print "\t".join(str(x) for x in [Thread.currentThread().getId(),SF_QUEUED.get(),SF_ASLEEP.get(),(System.currentTimeMillis()-SF_STARTED)] + list(args))
    def cLog(*args):

cLog( "Concurrent Threads: " + SF_MAX_CONCURRENT.__str__())
# Decorates ConcurrentLinkedQueue with tracking of total (global) number of
# queued elements. Also remaps the method names to be closer to python lists
class sf_safeQueue(ConcurrentLinkedQueue):
    # Note that this is actually the reverse of a python pop, this is actually
    # equivalent to [1,2,3,4,5].pop(0).
    def pop(self):
        r = self.poll()
        return r
    def append(self, what):

# Python implements Callable to alow Python closers to be executed in Java
# thread pools
class sf_callable(Callable):
    def __init__(self,toDo):
    # This method is that which causes a task to be executed. It actually
    # executes the Python closure which defines the work to be done
    def call(self):
        return ret

# Holds the Future created by submitting a sf_callable to the SF_POOL for
# execution. Note that this is also a Future for consistency, but its work
# is delegated to the wrapped future. 
class sf_futureWrapper(Future):
    def __init__(self,toDo):
        self.toDo   = toDo

    def __iter__(self):
        return iter(self.get())
    def isDone(self):
        return self.toDo.isDone()
    def get(self):
        return self.toDo.get()

# Also a Future (see sf_futureWrapper) but these execute the python closer
# in the thread which calls the constructor. Therefore, the results is available
# when the constructor exits. These are the primary mechanism for preventing
# The Embrace Of Meh.
class sf_getter(Future):
    def __init__(self,toDo):

    def isDone(self):
        return True

    def get(self):
        return self.result

# Queues of tasks which have not yet been submitted are thread local. It is only
# when a executor thread would become blocked that we go to work stealing. This
# class managed that thread locallity.
# TODO, should this, can this, go to using Python lists rather than concurrent
# linked queues.
class sf_taskQueue(ThreadLocal):
    def initialValue(self):
        return sf_safeQueue()

# The thread local queue of tasks which have not yet been submitted for
# execution

# The main coordination class for the schedular. Whilst it is a future
# it actually deligates execution to sf_futureWrapper and sf_getter objects
# for synchronous and asynchronous operation respectively
class sf_superFuture(Future):

    # - Wrap the closure (toDo) which is the actual task (work to do)
    # - Add that task to the thread local queue by adding self to the queue
    #   thus this object is a proxy for the task.
    # - Initialise a simple mutual exclusion lock.
    # - Mark this super future as not having been submitted for execution. This
    #   is part of the mechanism which prevents work stealing resulting in a
    #   task being executed twice.
    def __init__(self,toDo):

    # Used by work stealing to submit this task for immediate execution on the
    # the executing thread. The actual execution is delegated to an sf_getter
    # which executes the task in its constructor. This (along with submit) use
    # the mutex to manage the self.submitted field in a thread safe way. No
    # two threads can execute submit a super future more than once because
    # self.submitted is either true or false atomically across all threads. 
    # The lock has the other effect of synchronising memory state across cores
    # etc.
    def directSubmit(self):
        # Ensure this cannot be executed twice
        if self.submitted:
        # Execute
    # Normal (non work stealing) submition of this task. This might or might not
    # result in immediate execution. If the total number of active executors is
    # at the limit then the task will execute in the calling thread via a
    # sf_getter (see directSubmit for more details). Otherwise, the task is 
    # submitted to the execution pool for asynchronous execution.
    # It is important to understand that this method is not called directly
    # but is called via submitAll. submitAll is the method which subits tasks
    # from the thread local queue of pending tasks.
    def submit(self):
        # Ensure this cannot be submitted twice
        if self.submitted:
        # See if we have reached the parallel execution soft limit
        if count<SF_MAX_CONCURRENT:
            # No, so submit to the thread pool for execution
            # Yes, execute in the current thread

    # Submit all the tasks in the current thread local queue of tasks. This is
    # lazy executor. This gets called when we need results. 
    def submitAll(self):

    # The point of execution in the lazy model. This method is what consumers
    # of tasks call to get the task executed and retrieve the result (if any).
    # This therefore acts as the point of synchronisation. This method will not
    # return until the task wrapped by this super future has finished executing.
    # A note on stealing. Consider that we steal taskA. TaskA then invokes
    # get() on taskB. taskB is not completed. The stealing can chain here; 
    # whilst waiting for taskB to complete the thread can just steal another
    # task and so on. This is why we can use the directSubmit for stolen tasks.  
    def get(self):
        cLog( "Submit All")
        # Submit the current thread local task queue
        cLog( "Submitted All")
        # There is a race condition where by the submitAll has been called
        # which recursively causes another instance of get on this super future
        # which results in there being no tasks to submit but we get to this 
        # point before self.future has been set. This tends to resolve itself
        # very quickly as the empty calls to submit all do not do very much work 
        # so the origianl progresses and sets the future. Rather than a complex
        # and potentially brittle locking system, we just spin waiting for the
        # future to be set. This works fine on my Mac as it only ever seems to 
        # spin once, so the cost is pretty much the same as a locking approach
        # basically, one quantum. If this starts to spin a lot in the future
        # a promise/future approach could be used. 
        while not hasattr(self,"future"):
        cLog( "Raced: ", c ,t)
        # This is where the work stealing logic starts
        # isDone() tells us if the thread would block if get() were called on
        # the future. We will try to work steal if the thread would block so as
        # not to 'waste' the thread. This if block is setting up the
        # log/tracking information
        if not self.future.isDone():
        # back control increasing backoff of the thread trying to work steal.
        # This is not the most efficient solution but as Sonic Field tasks are
        # generally large, this approach is OK for the current use. We back of
        # between 1 and 100 milliseconds. At 100 milliseconds we just stick at
        # polling every 100 milliseconds.
        # This look steal work until the get() on the current task will 
        # not block
        while not self.future.isDone():
            # Iterate over the global set of pending super futures
            # Note that the locking logic in the super futures ensure 
            # no double execute.
            while it.hasNext():
                    # reset back (the back-off sleep time)
                    # Track number of tasks available to steal for logging
                    # The stollen task must be performed in this thread as 
                    # this is the thread which would otherwise block so is
                    # availible for further execution
                    # Now we manage state of 'nap' which is used for logging
                    # based on if we can steal more (i.e. would still block)
                    # Nap also controls the thread back off
                    if self.future.isDone():
                except Exception, e:
                    # All bets are off
                    cLog("Failed to Steal",e.getMessage())
                    # Just raise and give up
            # If the thread would block again or we are not able to steal as
            # nothing pending then we back off. 
            if nap==True:
                if back==1:
                    cLog("Non Pending")
                if back>100:
        if nap:
        # To get here we know this get will not block
        r = self.future.get()
        # Return the result if any
        return r

    # If the return of the get is iterable then we delegate to it so that 
    # this super future appears to be its embedded task
    def __iter__(self):
        return iter(self.get())
    # Similarly for resource control for the + and - reference counting 
    # semantics for SF_Signal objects
    def __pos__(self):
        return +obj

    def __neg__(self):
        return -obj

# Wrap a closure in a super future which automatically 
# queues it for future lazy execution
def sf_do(toDo):
    return sf_superFuture(toDo)
# An experimental decorator approach equivalent to sf_do
def sf_parallel(func):
    def inner(*args, **kwargs): #1
        return func(*args, **kwargs) #2
    return sf_do(inner)

# Shut the execution pool down. This waits for it to shut down
# but if the shutdown takes longer than timeout then it is 
# forced down.
def shutdown_and_await_termination(pool, timeout):
        if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
            if not pool.awaitTermination(timeout, TimeUnit.SECONDS):
                print >> sys.stderr, "Pool did not terminate"
    except InterruptedException, ex:

# The default shutdown for the main pool  
def shutdownConcurrnt():
    shutdown_and_await_termination(SF_POOL, 5)

COBOL in top 20 of Tiobe: What does this mean?

COBOL being in the top 20 (at 20) means it gets its own
long term trend graph!

A lot of people will say 'not much' because blar blar blar.

The simple fact is that Tiobe is fun in one way, a bit rubbish in another but a useful indicator non the less. For COBOL to be at 20 and for Java to have re-emerged as number one will cause many people cognitive distress.

  1. How come Scala is at 28?
  2. How come Clojure is not in the top 50.
  3. Why are legacy imperative and OO languages in resurgence?

Before we go any further let us tackle the 'Tiobe is bollocks' argument.

There is supporting evidence; shows the long term decline in COBOL jobs being reversed then flattened.

We also see upticks in C++ and Java on indeed. Interestingly we do not see the same uptick for C; I have no idea why but it does show how all these things are just indicators and nothing more.

We can see Scale and Clojure are growing fast in the job market but are still an order of magnitude behind the big players. Right now Scala is similar to COBOL indeed but sits a few steps below on the Tiobe (not in the top 20 yet).

Tiobe Top Twenty For November 2015

What does all this mean?

Personally I think it means the language revolution is over. Indeed, the distributed computing revolution is over. This does not mean progress has stopped, it is just no longer revolutionary. Languages like Java and C++ are so well evolved that they do not need to be replaced (I can hear people shouting at me already). They are the Otto cycle engines of the computer age. One day something fundamentally better will appear or, more likely, some change outside the industry will force a paradigm shift which will reflect its self in new programming techniques (think electric cars and global warming).

How can I go around saying the revolution is over or that Java and C++ are good enough? Also, how is this relevant to COBOL. The COBOL issue is down to two things: People realise it is going nowhere but the people programming it are; new people need to be hired and learn the skills. COBOL is going nowhere because the alternatives have not turned out to be enough better to justify the cost of rewriting all the existing systems which use it.

This is relevant to Java, C++ and C etc, because the same is true of these. They are all legacy languages now and they are all good enough. Again, how can I go around saying that? Well, the revolution in languages is over but the information technology revolution has not ended; it has moved. Revolutionary development is now in big processing, big data and now ways of applying prodigious parallel programming. The truth is that algorithmically processing terabytes of data is modelled at a higher level than programming languages and so any language (within reason) is as good as the next. Conversely, getting the nodes in a compute grid to communicate very quickly indeed is the sort of low level challenge these legacy languages are rather good at. 

It is possible that middle ground semi-functional languages like Scala might slowly replace the true legacy incumbents. I personally doubt it given current trends but who knows, maybe in 10 years time that will be the case. Equally, in ten, twenty and even 30 years time there will still be tens of billions of lines of C, C++, COBOL and Java out there and jobs available for people to write in them.

The revolution is dead, long live the revolution.

RBJ Biquad Filters in C++11 Style

A flare for simplicity.

Image From NASA Solar Observatory.

IIR biquandratic filers are just awesome for shaping sound. I implemented the beautiful and elegant biquad designed by RBJ previously in Java. Here is how I have redone the work in idiomatic C++11.

Why are these filters so useful, and more explicitly,  in my work with additive synthesis why would any signal processing filter be required? Surely we should be able to perform the same effects with just the frequency envelope approach I described here (Passing Functor Templates To Shape Note Harmonics).

The answer is that biquadratic filters 'ring'. One might think of this as being a bad thing but is in not. If we put a signal through a IIR biquad filter with a very high Q (resonance in effect) it adds its own tone to the signal. The effect of boosting and swashing harmonics is not simple either as the effects are non linearly associated with frequency and introduce phase shifts. The effect of ringing (self oscillating) filters is critical in analogue synthesis and I have found it amazingly useful in digital synthesis just the same.

In the above piece I used self oscillation of a low pass filter to help shape the flute sounds. Whilst I am quite sure, with enough effort, the same sort of sounds could be produced with nothing but additive synthesis (well - any sound can be - see the video below) it simply is not worth the effort. Biquad filters are fast, efficient and make the job of polishing sounds really, very much simpler.

RBJ filters have been implemented in C++ before. For SonicCpp though I wanted to make the implementation more template based and allow for inlining of the signal filtering function without polluting all the translation units with the large and complex filter coefficient computation code.

So, below is my code. The ever so useful 'exciter' filter is in there as well. This is just an amazingly powerful filter considering how simple it is. I designed it some time ago and find it adds brightness to sounds without giving an ugly distortion effect. The exciter  (Java version) was use extensively in the spatialisation effects in the following piece.

 The Code

#pragma once 
#include "core.hpp" 
namespace SonicCpp 
    namespace Filter 
        template<typename Src, volume_t Pwr> 
        class Exciter 
            const Src &m_gen; 
            Exciter(const Src &gen) : m_gen(gen) 
            { } 
            double get(size_t i) const 
                double q = m_gen.get(i); 
                if (q < 0) 
                    return -std::pow(-q, unVolume(Pwr)); 
                return std::pow(q, unVolume(Pwr)); 
            constexpr static size_t size() 
                return Src::size(); 
        enum class BiquadType 
        struct BiquadFilterBase 
            // filter coeffs 
            double b0a0, b1a0, b2a0, a1a0, a2a0 = 0; 
            // in/out history 
            double ou1, ou2, in1, in2 = 0; 
            void calcFilterCoeffs(BiquadType, pitch_t, pitch_t, volume_t); 
        template<typename Src, BiquadType Type, pitch_t Frequency, pitch_t Q, volume_t DbGain> 
        class BiquadFilter : public Generator, Type, Frequency, Q, DbGain>>, public BiquadFilterBase 
            const Src &m_gen; 
            double filter(double in0) 
                // filter 
                const double yn = b0a0 * in0 + b1a0 * in1 + b2a0 * in2 - a1a0 * ou1 - a2a0 * ou2; 
                // push in/out buffers 
                in2 = in1; 
                in1 = in0; 
                ou2 = ou1; 
                ou1 = yn; 
                // return output 
                return yn; 
            BiquadFilter(const Src &gen) : m_gen(gen) 
                calcFilterCoeffs(Type, Frequency, Q, DbGain); 
            double get(size_t i) const 
                // TODO This is just abuse - should be converted to a 
                // cache based mechanism to maintain effective constness 
                return const_cast, Type, Frequency, Q, DbGain> *>(this)->filter(m_gen.get(i)); 
            constexpr static size_t size() 
                return Src::size(); 
#include "filter.hpp" 
namespace SonicCpp 
    namespace Filter 
        void BiquadFilterBase::calcFilterCoeffs(BiquadType Type, pitch_t Frequency, pitch_t Q, volume_t DbGain) 
            const double frequency = unPitch(Frequency); 
            const double q = unPitch(Q); 
            const double db_gain = unVolume(DbGain); 
            bool q_is_bandwidth = false; 
            const double sample_rate = SAMPLE_RATE; 
            switch (Type) 
                    q_is_bandwidth = false; 
                    q_is_bandwidth = true; 
            // temp coef vars 
            double alpha, a0 = 0, a1 = 0, a2 = 0, b0 = 0, b1 = 0, b2 = 0; 
            // peaking, lowshelf and hishelf 
            if (Type == BiquadType::PEAK || Type == BiquadType::HIGHSHELF || Type == BiquadType::LOWSHELF) 
                const double A = std::pow(10.0, (db_gain / 40.0)); 
                const double omega = 2.0 * M_PI * frequency / sample_rate; 
                const double tsin = std::sin(omega); 
                const double tcos = std::cos(omega); 
                if (Type == BiquadType::PEAK) 
                    alpha = tsin * std::sinh(std::log(2.0) / 2.0 * q * omega / tsin); 
                    alpha = tsin / 2.0 * std::sqrt((A + 1 / A) * (1 / q - 1) + 2); 
                const double beta = std::sqrt(A) / q; 
                // peaking 
                if (Type == BiquadType::PEAK) 
                    b0 = (1.0 + alpha * A); 
                    b1 = (-2.0 * tcos); 
                    b2 = (1.0 - alpha * A); 
                    a0 = (1.0 + alpha / A); 
                    a1 = (-2.0 * tcos); 
                    a2 = (1.0 - alpha / A); 
                // lowshelf 
                if (Type == BiquadType::LOWSHELF) 
                    b0 = (A * ((A + 1.0) - (A - 1.0) * tcos + beta * tsin)); 
                    b1 = (2.0 * A * ((A - 1.0) - (A + 1.0) * tcos)); 
                    b2 = (A * ((A + 1.0) - (A - 1.0) * tcos - beta * tsin)); 
                    a0 = ((A + 1.0) + (A - 1.0) * tcos + beta * tsin); 
                    a1 = (-2.0 * ((A - 1.0) + (A + 1.0) * tcos)); 
                    a2 = ((A + 1.0) + (A - 1.0) * tcos - beta * tsin); 
                // hishelf 
                if (Type == BiquadType::HIGHSHELF) 
                    b0 = (A * ((A + 1.0) + (A - 1.0) * tcos + beta * tsin)); 
                    b1 = (-2.0 * A * ((A - 1.0) + (A + 1.0) * tcos)); 
                    b2 = (A * ((A + 1.0) + (A - 1.0) * tcos - beta * tsin)); 
                    a0 = ((A + 1.0) - (A - 1.0) * tcos + beta * tsin); 
                    a1 = (2.0 * ((A - 1.0) - (A + 1.0) * tcos)); 
                    a2 = ((A + 1.0) - (A - 1.0) * tcos - beta * tsin); 
                // other filters 
                const double omega = 2.0 * M_PI * frequency / sample_rate; 
                const double tsin = std::sin(omega); 
                const double tcos = std::cos(omega); 
                if (q_is_bandwidth) 
                    alpha = tsin * std::sinh(std::log(2.0) / 2.0 * q * omega / tsin); 
                    alpha = tsin / (2.0 * q); 
                // lowpass 
                if (Type == BiquadType::LOWPASS) 
                    b0 = (1.0 - tcos) / 2.0; 
                    b1 = 1.0 - tcos; 
                    b2 = (1.0 - tcos) / 2.0; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
                // hipass 
                if (Type == BiquadType::HIGHPASS) 
                    b0 = (1.0 + tcos) / 2.0; 
                    b1 = -(1.0 + tcos); 
                    b2 = (1.0 + tcos) / 2.0; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
                // bandpass csg 
                if (Type == BiquadType::BANDPASS_SKIRT) 
                    b0 = tsin / 2.0; 
                    b1 = 0.0; 
                    b2 = -tsin / 2; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
                // bandpass czpg 
                if (Type == BiquadType::BANDPASS_PEAK) 
                    b0 = alpha; 
                    b1 = 0.0; 
                    b2 = -alpha; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
                // notch 
                if (Type == BiquadType::NOTCH) 
                    b0 = 1.0; 
                    b1 = -2.0 * tcos; 
                    b2 = 1.0; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
                // allpass 
                if (Type == BiquadType::ALLPASS) 
                    b0 = 1.0 - alpha; 
                    b1 = -2.0 * tcos; 
                    b2 = 1.0 + alpha; 
                    a0 = 1.0 + alpha; 
                    a1 = -2.0 * tcos; 
                    a2 = 1.0 - alpha; 
            // set filter coeffs 
            b0a0 = (b0 / a0); 
            b1a0 = (b1 / a0); 
            b2a0 = (b2 / a0); 
            a1a0 = (a1 / a0); 
            a2a0 = (a2 / a0); 

Note how the filter settings are template arguments. How can something like frequency or Q which are naturally double or float values be represented as template arguments? The answer is that SonicCpp represents these values as scaled 64 bit integers. I scale them 12 decimal places to the right so that 1.0 is represented 1000000000000. This gives just enough 'head room' to do simple mathematical calculations with frequencies and volumes without causing overflow.

Now we have these as template arguments the constructor can take no arguments other than the source of the signal to be processed. In SonicCpp signals are represented as generators and generators are template instantiations where the size (number of samples) is a template argument. This means that we can ensure no bounds violations from one generator accessing another because the type system ensures the sizes match at compile time and or we can use static asserts. Hence, the code is robust but with no runtime overhead for range checking.

Note how the constructor calls off to calcFilterCoeffs(Type, Frequency, Q, DbGain). Actually, I could have put the entire constructor definition in the cpp file, but I have just put the heavy lifting part in the cpp. In theory I could have computed the coefficients at compile time as they come from the template parameters. In C++14 this would be trivial but in 11 is would be very hard to implement due to the one function only limit of constexpr. Nevertheless, the filter coefficients are computed very rarely, so the important thing is to get their computation out of the header file so that the translation units are not bloated by their code. However, the filter method (call by the gen method for clarity) needs to remain in the header file so that it can be inlined and so optimised into the rest of the synthesis call graph.

Finally, why BiquadFilterBase? This makes it really easy to implement the calcFilterCoeffs in the cpp file without having to worry about the signatures of the actual filter being templatised; as the inheritance is not virtual the performance is not adversely affected yet their is, again, less bloat and nice simple code to read.