The full code and documentation is available on Github. Cover photograph by Samyuktha Sridhar.

Introduction

Swara is the South Indian word for a musical note. It’s an appropriate name for a musical machine learning project that attempts to imitate the legends of South Indian music, who leverage their vast and varied musical experiences to extemporaneously produce profoundly intricate sequences of swaras.

Musical machine learning is an area of active research (e.g. Google Magenta and Jukedeck). In this article, I’ll describe my approach to the development of a variety of musical applications and the various musical and machine learning models that enable them.

Musical Modeling

The first challenge with Swara was to develop a library, swara-music, for digitally representing sheet music. The library enables the programmatic construction and modification of the various musical elements of a song while retaining their underlying musical meaning. For example, a waltz tempo may be constructed using the library in the following manner.

// Waltz Tempo.
val waltz = Tempo(
  signature = Length(4, 4),
  bpm = 80.0
)

The swara-music library currently supports the following musical elements: Song, Fragment, Key, Tempo, Phrase, Voice, Chord, Note, Pitch, and Length. These simple musical primitives can be combined to form highly complex musical arrangements. For example, this code produces the following fragment of sheet music (rendered using MuseScore 2).

However, the library is still too unwieldy to be frequently used to directly write music. To facilitate interoperability with existing music writing tools, the library provides adapters for various file formats (midi, json, etc.).

Working with midi is notoriously difficult
Cool builder pattern

Machine Learning

The next challenge with Swara was to develop swara-learn, a library of generalized machine learning models. Disclaimer: this library does not pretend to compete with high-performance machine learning implementations like TensorFlow and Caffe. This library is simply the result of genuine curiosity into the inner workings of artificial intelligence.

Discrete Markov Chain

A DiscreteMarkovChain is an unsupervised random process that undergoes transitions from one state to another. They are a special case of a family of stochastic models known as markov models, which are used to model random processes in which future states depend only on some number of previous states and not on any states prior. This “memoryless” property, is more formally, the requirement that for some discrete sequence \(x_0, \ldots, x_n\) the probability \(P(x_n \vert x_{n-1}, \ldots, x_{0}) = P(x_n \vert x_{n-1}, \ldots, x_{k}\) for some \(k > 0\).

The implementation of discrete markov chains is built off a simple, but high-performance custom Trie and, unlike other implementations, does not require an explicit definition of the state space and the various transition probabilities between states. The DiscreteMarkovChain is fully thread-safe; therefore, it may be simultaneously be trained and used.

Hidden Markov Model

A HiddenMarkovModel is a supervised learning technique that models a system in which every sequence of observations \(O_1, \ldots, O_n\) is generated by a Markov process whose state \(H_t\) is hidden from the observer. Hidden markov models are especially useful for sequence prediction (e.g. speech-to-text) and generation (e.g. speech generation). The implementation was designed with the following considerations in mind:

Unknown state space: In many problems, the state space is not known a priori. For example, the state space of all possible chords is infinite because they may contain any number of any note; however, all songs only ever use a finite number of distinct chords. Traditional dyamic programming algorithms like the Viterbi Algorithm require you to preallocate an \(n\) by \(\vert H \vert\) matrix. This is impossible if the underlying state space \(H\) is unknown. Therefore, I was forced to use an A* variation instead.
Concurrency: Implementation will be trained on massive datasets, so it would be interesting if it may be used for prediction and generation while it is trained.

Genetic Algorithms

A genetic algorithm is an algorithm that mimics the processes of biological evolution to find optimal solutions to problems. In high school biology, we learned that when two organisms reproduce their genomes are recombined and mutated to produce offspring. Over many generations, favorable traits are naturally selected and become more predominant within a population. By mathematically defining the genetic operators that enable evolution (recombination, mutation, natural selection), we can harness the power of nature to solve arbitrary problems.

The genetic algorithm library exposes the Selector, Mutator, Evaluator, and Recombinator traits which allow the rules of evolution to be arbitrarily defined for any problem. A Population may then be evolved according to these defined rules.

Evaluator: Maps each individual in the population to an evolutionary fitness score, which represents the likilihood an individual will survive and reproduce (ie. analogue of survival of the fittest).
Selector: Defines which individuals are selected for reproduction (ie. analogue of natural selection).
Recombinator: Defines how two parents who have been selected for reproduction produce offspring (ie. analogue of cellular reproduction).
Mutator: Defines how an individual’s genome is mutated. Essential to encourage diversity and variation within a population (ie. analogue of cellular mutation).

For example, consider the problem of function maximization. Suppose we would like to maximize some function \(f \colon \mathbb{R} \mapsto \mathbb{R}\) over some interval \((from, until) \subset \mathbb{R}\) but \(f\) is not differentiable. Clearly, the standard method of locating maxima by finding points at which the derivative vanishes is not applicable. However, we may still use a genetic algorithm if we first define the rules of evolution in the context of function maximization. We construct a population of randomly selected points \(x \in (from, until)\) and define the evaluator to be \(eval(x) = f(x)\), the recombinator to be the average \(recombine(x_1, x_2) = \frac{x_1 + x_2}{2}\), the mutator to jitter by a gaussian random \(mutate(x) = x + \epsilon\) for \(\epsilon ~ N(0, 1)\), and the selector to be a standard Tournament Selector.

val generator  = Rand.uniform.map(_ * (until - from) + from)
var population = Population(Seq.fill(100)(generator.draw()))

(1 to 50).foreach { _ =>
  population = population.evolve(
    selector = new TournamentSelector(5),
    recombinator = (f, m) => (f + m) / 2.0,
    mutator = x => x + Rand.gaussian.draw(),
    evaluator = x => f(x),
  )
}

population.members.maxBy(f)

Future Work

The next challenge of Swara will be to build the swara-core library, which will apply the swara-music and swara-learn libraries to build exciting musical technologies like:

Algorithmic Composition: Generating original, but representative scores of music. Create a SoundCloud Profile with entirely computer generated content.
Fingerprinting: Generating a musical fingerprint, or identifier, which uniquely define a piece of music. This fingerprint can then be used to perform musical identification and search (like Shazam).
Musical Translation: Extracting musical information from audio input sources. It is the musical analogue of speech-to-text.