# Tune sources

Here we demonstrate included utilities for loading tune data.

In [None]:
from pyabc2.sources import load_example, norbeck, the_session

A few examples are included in the package, accessible with {func}`pyabc2.sources.load_example` (returns {class}`~pyabc2.Tune`) and {func}`pyabc2.sources.load_example_abc` (returns ABC string).

In [None]:
load_example("For the Love of Music")

The tune source modules, demonstrated below, download tune data from the internet.

## Norbeck

{func}`norbeck.load() <pyabc2.sources.norbeck.load>` gives us a list of {class}`~pyabc2.Tune`s for one of [Norbeck's](https://www.norbeck.nu/abc/) tune type groups (e.g. 'jigs', 'reels', 'slip jigs').

In [None]:
tunes = norbeck.load("jigs")
print(len(tunes), "jigs loaded")

tunes[0]

In [None]:
tunes[-1]

## The Session

{func}`the_session.load() <pyabc2.sources.the_session.load>` gives us a list of {class}`~pyabc2.Tune`s loaded from a (frequently updated) archive of all of the tunes in [The Session](https://thesession.org/). This is a large dataset, so here we cap the processing.

In [None]:
tunes = the_session.load(n=500)

tunes[0]

In [None]:
tunes[-1]

In [None]:
tune = the_session.load_url("https://thesession.org/tunes/21799#setting43712")
tune

In [None]:
tune.print_measures()

### Data archive

The Session data archive (<https://github.com/adactio/TheSession-data>) has many datasets ({func}`pyabc2.sources.the_session.load_meta`),
which we can use in other ways besides parsing ABCs to {class}`~pyabc2.Tune`s.

For example, we can look for the most common ABC notes in the corpus.

In [None]:
%%time

df = the_session.load_meta("tunes", convert_dtypes=True)
df

In [None]:
df.info()

In [None]:
from pyabc2.note import _RE_NOTE as rx

rx

This regular expression does also match letters in tune titles, say.

In [None]:
["".join(tup) for tup in rx.findall("the quick brown fox jumps over the lazy dog")]

But The Session stores the tune body separately (in the `abc` field) and encourages a bare-bones melody-focused approach, so we can expect to mostly be matching actual notes.

In [None]:
from pprint import pprint

cool = df.query("tune_id == 1 and setting_id == 1")
display(cool.T)

abc = cool.abc.iloc[0]
print(abc, "\n")

pprint([m.group() for m in rx.finditer(abc)], compact=True)

In [None]:
%%time

note_counts = (
    df.abc
    .str.findall(rx)
    .explode()
    .str.join("")
    .value_counts()
)
note_counts

In [None]:
note_counts[:20]

👆 We can see that `A` (unit duration) is the leader, being a prominent pitch in many of the common keys.
* 5 in Dmaj
* 2 in Gmaj
* 1 in Ador, Amin, Amix, Amaj

```{note}
`A` implies A₄, the A above middle C, the A string on a violin, the lower register on the flute, etc.
```

```{note}
In general we don't know the duration of `A` without context (`L:` header field, or based on `M:` if `L:` is not set).
However, in this case, we know that the The Session presets the unit duration to `1/8`,
so `A` is an eighth note.
```

In [None]:
from textwrap import wrap

print("\n".join(wrap("  ".join(note_counts[note_counts == 1].index))))

👆 A variety of ABC note specs appear only once. Many of these have unusual durations or accidentals.

What if we ignore everything except the natural note name?

In [None]:
nat_cased_counts = (
    note_counts
    .reset_index(drop=False)
    .rename(columns={"index": "note", "abc": "count"})
    .assign(nat=lambda df: df.note.str.extract(r"([a-gA-G])"))
    .groupby("nat")
    .aggregate({"count": "sum"})["count"]
    .sort_values(ascending=False)
)
nat_cased_counts

👆 `A` is still our leader, but otherwise things have shifted a bit.
Note `C`, which generally implies a pitch outside of the range of most whistles and flutes,
has the lowest count.
Although `b` is inside that range, many tunes don't have one.

In [None]:
from pyabc2 import Note

(
    nat_cased_counts
    .to_frame()
    .assign(value=lambda df: df.index.map(lambda x: Note.from_abc(x).value))
    .sort_values("value")["count"]
    .plot.bar(
        xlabel="ABC letters\n(accidentals, octave indicators, and context in key ignored)",
        rot=0,
        ylabel="Count",
        title="ABC prevalance in The Session",
    )
);