# Ragabot - Art Encoded #

Indian classical (hindustani) music has some definite structure. We will try to make use of that structure to generate indian classical tunes using a python programme. Lets try to understand some basics first. 

### Notations ###
There are seven shudhha swars (or notes). All seven together are called saptak. Every next swar is a sound with frequency higher than current swar.

In [None]:
saptak = ['Sa', 'Re', "Ga", "Ma", "Pa", "Dha", "Ni"]

In addition to _shudhha_ swars , few swars have there _komal_ counter part. For Example `Re, Ga, Dha` and `Ni` these swars also come as komal in some tunes. Similarly `Ma` has tivra counter part instead of komal. `Sa` and `Pa` do not have komal or tivra versions. We will denote shudhha swar with just plain letters. Komal swar with an underscore attached to it, while for tivra swar we will add a tail of two underscore. In all we have 12 swar in one saptak then!


In [2]:
saptak = ['Sa','Re_',"Re","Ga_","Ga","Ma","Ma__","Pa","Dha_","Dha","Ni_","Ni"]
len(saptak)

12

There are three saptaks that we are going to make use of. _mandra_,_madhya_ and _tar_ saptak. We will denote all swar in _mandra_ saptak with lower case letters. Swar from _madhya_ saptak will be denoted by capitalized from (first letter upper case, rest lower), _tar_ saptak's swar by all upper case letters

In [7]:
mandra_saptak = [s.lower() for s in saptak]
mandra_saptak

['sa',
 're_',
 're',
 'ga_',
 'ga',
 'ma',
 'ma__',
 'pa',
 'dha_',
 'dha',
 'ni_',
 'ni']

In [5]:
tar_saptak = [s.upper() for s in saptak]
tar_saptak

['SA',
 'RE_',
 'RE',
 'GA_',
 'GA',
 'MA',
 'MA__',
 'PA',
 'DHA_',
 'DHA',
 'NI_',
 'NI']

### Raga ###

Raga consists of subset (not necessarily proper subset) of these 12 swar of a saptak. Further, when composing tunes in a raga, there are certain swars/notes that are allowed when notes are increasing in their frequencies (_aaroha_) and certain other notes when they are decreasing in frequency (_avroha_). Some ragas are more complicated in that when notes are increasing in frequency there are certain notes from which one must ‘descend’, i.e., use lower frequency notes and then continue the ‘ascent’ of increasing frequency notes. This last structure is also called the ‘chalan’ of the raga. Further, some notes are granted greater relative time than others, often called ‘nyas’ notes. 
For example here is a sample tune from raag bhoop.
```
SA,SA,Dha,Pa,Ga,Re,Sa,Re,Ga,Ga,Pa,Ga,Dha,Pa,Ga,Ga
Ga,Pa,Dha,SA,RE,SA,Dha,Pa,SA,Pa,Dha,Pa,Ga,Re,Sa,Sa
Ga,Ga,Pa,Dha,Pa,SA,SA,SA,Dha,Dha,SA,RE,GA,RE,SA,Dha
GA,GA,RE,SA,RE,RE,SA,Dha,SA,Pa,Dha,Pa,Ga,Re,Sa,Sa

Ga,Re,Ga,Ga,Sa,Re,Sa,Sa,Sa,Sa,Sa,dha,Sa,Re,Ga,Ga
Pa,Ga,Pa,Pa,Dha,Dha,Pa,Pa,Ga,Pa,Dha,SA,Dha,Pa,Ga,Sa
Pa,Ga,Ga,Re,Ga,Pa,SA,Dha,SA,SA,SA,SA,Dha,Re,SA,SA
Dha,Dha,Dha,Dha,SA,RE,GA,RE,SA,SA,Dha,Pa,Dha,SA,Dha,Pa
Ga,Re,Ga,Ga,Ga,Re,Pa,Ga,Dha,Pa,Dha,SA,Dha,Pa,Ga,Sa
```

 ### Can we emulate the patterns of raga? ###
 
We make use of some existing tunes and assign a probabilty of transition from one swar to next. The structures/patterns defining the raga are encoded in a set of probabilities. Attached to each note is an array of probabilities which specifies the probability of transitioning from that note to other notes. In tech parlance this would be a Markov chain of sorts where the transition matrix is a stochastic matrix.
    
```
     ___0.1___ Pa        ___0.5__ Ga
    / __0.1___ Ga       /
   / / _0.3___ Re-------____0.4__ Sa
  / / /                 \
Sa _____0.2___ Sa        \__0.1__ dha
  \ 
   \ \ \_0.25_ dha 
    \ \__0.04_ pa
     \___0.01_ ga

```

### Probabilties ###
So lets take our sample data and construct a transition probability for each swar! For that we will make use of very simple python function

In [1]:

def histogram(data):
    """
    data is raga tune as a list of lines.
    every line is a list of swar from the tune.
    """
    hist = {}
    
    for line in data: #histogram of whats next item
        for i, item in enumerate(line[:-1]):
            itemd = hist.get(item, {})
            itemd[line[i+1]] = itemd.get(line[i+1], 0) + 1
            hist[item] = itemd

    return hist

def transition_probability(hist):
    prob = {}
    for k in hist:#compute probabilitis
        total  = sum(hist[k].values())
        prob[k] = {j: v/total for j,v in hist[k].items()}
    return prob
                

### Sample data for raag bhoop ###
Here are few sample tunes of raag bhoop. After every beat there is a comma. So whereever there are two or more swar without any comma in between, they are to be played in half note.

In [22]:
%%file bhoop1.csv
SA,SA,Dha,Pa,Ga,Re,Sa,Re,Ga,Ga,Pa,Ga,Dha,Pa,Ga,Ga
Ga,Pa,Dha,SA,RE,SA,Dha,Pa,SA,Pa,Dha,Pa,Ga,Re,Sa,Sa
Ga,Ga,Pa,Dha,Pa,SA,SA,SA,Dha,Dha,SA,RE,GA,RE,SA,Dha
GA,GA,RE,SA,RE,RE,SA,Dha,SA,Pa,Dha,Pa,Ga,Re,Sa,Sa

Ga,Re,Ga,Ga,Sa,Re,Sa,Sa,Sa,Sa,Sa,dha,Sa,Re,Ga,Ga
Pa,Ga,Pa,Pa,Dha,Dha,Pa,Pa,Ga,Pa,Dha,SA,Dha,Pa,Ga,Sa
Pa,Ga,Ga,Re,Ga,Pa,SA,Dha,SA,SA,SA,SA,Dha,Re,SA,SA
Dha,Dha,Dha,Dha,SA,RE,GA,RE,SA,SA,Dha,Pa,Dha,SA,Dha,Pa
Ga,Re,Ga,Ga,Ga,Re,Pa,Ga,Dha,Pa,Dha,SA,Dha,Pa,Ga,Sa

Sa,Re,Ga,Pa,Ga,Re,Sa,Sa,Re,Pa,Pa,Pa,Re,Ga,Ga,Re
Ga,GaPa,Ga,Re,Ga,Pa,Dha,SA,SA,SA,SA,Dha,Dha,Pa,Ga,Pa
DhaRE,SA,SA,Dha,Dha,Pa,Ga,Re,GaPa,DhaSA,PaDha,SA,DhaSA,DhaPa,GaRe,Sa

Pa,Ga,Ga,Ga,Pa,Pa,SA,Dha,SA,SA,SA,SA,SARE,GARE,SA,SA
SA,Dha,Dha,SA,SA,SA,RE,RE,DhaSA,PaDha,SA,SA,Dha,Dha,Pa
Ga,GaPa,Ga,Re,Ga,Pa,Dha,SA,SARE,GARE,SA,DhaPa,DhaSA,DhaPa,GaRe,GaPa,GaRe,Sa

Sa,dha,dha,Sa
dha,Sa,Re
Sa,Re
dha,Sa
Sa,Re,Ga,Re,Ga,Sa,Re,dha,Sa
Sa,Re,Ga,Re,Ga,Pa,Ga,Re,Pa,Ga,dha,dha,Sa
Ga,Pa,Dha,Ga,Ga,Ga,Pa
Ga,Pa,Dha,Pa,Ga,Re,Sa
Ga,Pa,Dha,SA,SA,Dha,Pa,Ga,Re,Ga,Re,Pa,Ga,Re,Sa
Ga,Re,Sa,Re,Ga,Pa,Dha,SA,Pa,Dha,SA,RE,GA,RE,SA
Dha,SA,RE,SA,Dha,SA,Dha,Pa,Ga,Pa,Dha,Pa,Ga,Pa,Ga,Re,Sa,dha,dha,Sa


Overwriting bhoop1.csv


Lets read this data in list structure.

In [23]:
def readcsv(filename):
    with open(filename) as f:
        return [line.strip().split(",") for line in f]

In [24]:
hist_  = histogram(readcsv("bhoop1.csv"))

Lets look at what does this histogram means!

In [28]:
hist_['Dha']

{'Pa': 19, 'SA': 17, 'Dha': 9, 'Re': 1, 'Ga': 1}

In [42]:
def textplot(hist):
    for k in sorted(hist, key=lambda k: hist[k], reverse=True):
        print(k.rjust(9), str(hist[k]).rjust(3), hist[k]*"*")

In [43]:
textplot(hist_['Dha'])

       Pa  19 *******************
       SA  17 *****************
      Dha   9 *********
       Re   1 *
       Ga   1 *


This means in our training data, `Pa` has occured more than other note after `Dha`. So next most probable note after `Dha` is `Pa` while next to that counts `SA`.

Probabilitis for `Dha` look like as given below

In [35]:
probs = transition_probability(hist_)
probs['Dha']

{'Pa': 0.40425531914893614,
 'SA': 0.3617021276595745,
 'Dha': 0.19148936170212766,
 'Re': 0.02127659574468085,
 'Ga': 0.02127659574468085}

### Sampling , given the probabilities###

Now our next problem is given these transition probabilities for `Dha` how do we sample so that we find next note based on probabilty, but same time random also! This problem boils down to a problem of sampling item from an array based on probability for each position.


In [2]:
import random
def sample(items, probs):
    r = random.random()
    index = 0
    while(r >= 0 and index < len(probs)):
        r -= probs[index]
        index += 1
    return items[index - 1]


def test_sample():
    probs = {'Pa': 0.40425531914893614,
             'SA': 0.3617021276595745,
             'Dha': 0.19148936170212766,
             'Re': 0.02127659574468085,
             'Ga': 0.02127659574468085}
    keys = [i for i in probs.keys()]
    pvalues = [probs[i] for i in keys]
    samples = [sample(keys, pvalues) for i in range(10000)]
    for item in probs:
        print(item, samples.count(item)/10000)
    assert abs(probs['Pa']-samples.count('Pa')/10000)<0.01
    
test_sample()


Pa 0.4042
SA 0.3621
Dha 0.1924
Re 0.0215
Ga 0.0198


### ragabot encoded!###
Lets create a sample random tune in raag bhoop!

In [73]:
bhoop_probs = transition_probability(histogram(readcsv("bhoop1.csv")))
def aalap(initial, transition_prob):
    current = initial
    
    while True:
        yield current
        probs = transition_prob[current]
        items = [i for i in probs.keys()]
        pvalues = [probs[item] for item in items]
        current = sample(items, pvalues)
        
def take(seq, n):
    return [next(seq) for i in range(n)]


In [76]:
bhoop = aalap("Sa",bhoop_probs)

In [77]:
take(bhoop, 32)

['Sa',
 'Sa',
 'Re',
 'GaPa',
 'GaRe',
 'GaPa',
 'Ga',
 'Sa',
 'Re',
 'Ga',
 'Re',
 'Pa',
 'Ga',
 'Ga',
 'Re',
 'Pa',
 'Ga',
 'Ga',
 'Pa',
 'Ga',
 'Re',
 'Sa',
 'Re',
 'Ga',
 'Pa',
 'Dha',
 'SA',
 'Dha',
 'SA',
 'DhaPa',
 'GaRe',
 'GaPa']

### How do we examine it!! ###
one way is to play it or sing it! But there is another way! When we listen to a tune, how do we understand that it is playing in raag `bhoop`? By special sequencs of notes that come frequently in raag bhoop. we call it chalan or pakad. So let our python function generate some long tune for sufficiently long time. we will count how many times pakad comes into that sequence. This is just one indicator

In [78]:
from collections import deque

def search(seq, subseq, end=100):
    def compare(source, dest):
        for item in dest:
            return any(["".join(item).lower() in "".join(source).lower() for item in dest])
    
    n = len(max(subseq, key=len))
    window = deque(take(seq, n), n)
    for i in range(n, end):
        if compare(window, subseq):
            yield i-n
            window = deque(take(seq, n), n)
        else:
            window.append(next(seq))
            

def count(seq):
    return sum(1 for i in seq)
        

In [81]:
bhoop = aalap("Sa", bhoop_probs)
pakad = [["dha","dha","sa"],["ga","re","pa","ga"],["dha","pa","ga","re"]]

sum([count(search(bhoop,pakad, 32)) for i in range(1000)])/1000

1.099