vignettes/functions_usage.Rmd

---
title: "Usage of pyramidi functions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Usage of pyramidi functions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Introduction

I started to write this vignette a while ago, before I knew object-oriented programming (OOP) in R. 
So this might be interesting for you if you don't know OOP but want to learn more about all the internals of pyramidi. 
If you just want to see some use cases or if you know well R6, the other vignettes might be a better place to start. 


### Load libraries

First load some libraries:

```{r, include=FALSE, message=FALSE}
pyramidi::install_miditapyr(envname = "r-reticulate")
```


```{r setup, message=FALSE}
library(pyramidi)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(zeallot)
```

### Extract midi into dataframe

We'll extract the information of a midi file into dataframe. We'll use the package internal midi file:

```{r midi_df}
midi_file_str <- system.file("extdata", "test_midi_file.mid", package = "pyramidi")

midifile <- mido$MidiFile(midi_file_str)
ticks_per_beat <- midifile$ticks_per_beat
```

Now we can load the information of the `midifile` into a dataframe:

```{r midi_df2, message=FALSE}
dfc = miditapyr$frame_midi(midifile)
head(dfc, 20)
```

This dataframe contains the columns of the track index `i_track`, `meta` (whether the midi event is a note event), and `msg` containing named lists of further midi event information.

The `MidiFile()` function of `mido` also yields the [`ticks_per_beat`](https://mido.readthedocs.io/en/latest/midi_files.html#tempo-and-beat-resolution) of the file:

```{r ticksperbeat}
ticks_per_beat
```

The `miditapyr$unnest_midi()` function transforms the `msg` column of the dataframe to a wide format, where every new column name corresponds to the names in the lists in `msg` (like `tidyr::unnest_wider()`):

```{r}
df <- miditapyr$unnest_midi(dfc) %>% as_tibble()
head(df, 20)
```

Except the `name` column this seems to be the same as 
```{r}
dfc %>% unnest_wider(msg)
```

### Translate midi time information

In the midi format, time is treated as relative increments between events (measured in ticks). 
In order to derive the total time passed, you can use the function `tab_measures()`:


```{r}
dfm <- tab_measures(df, ticks_per_beat, c("m", "b")) %>%
  # create a variable `track` with the track name (in order to have it in the plot below)
  mutate(track = ifelse(purrr::map_chr(name, typeof) != "character", 
                        list(NA_character_), 
                        name)) %>%
  unnest(cols = track) %>% 
  fill(track)

dfm
```

This function adds further columns:

* `ticks`: specifying the total ticks passed,
* `t`: specifying the total time in seconds passed,
* `m`: specifying the total [measures](https://en.wikipedia.org/wiki/Bar_(music)) (bars) passed,
* `b`: specifying the total [beats](https://en.wikipedia.org/wiki/Beat_(music)) passed,
* `i_note`: unique ascending index for every track and midi note in the midi file.

### Further processing of the midi events

You can split the dataframe in two by whether the events are [meta](https://mido.readthedocs.io/en/latest/meta_message_types.html) or not:

```{r}
dfm %>% 
    miditapyr$split_df() %->% c(df_meta, df_notes)
```

```{r df_meta}
df_meta %>% as_tibble()
```


```{r df_notes}
df_notes %>% as_tibble()
```


### Pivot note dataframe to wide

Each note in the midi file is characterized by a `note_on` and a `note_off` event.
In order to generate a piano roll plot with ggplot2, we need to `tidyr::pivot_wider()` those events.
This can be done with the function `pivot_wide_notes()`:

```{r df_notes_wide}
df_not_notes <- 
  df_notes %>% 
  dplyr::filter(!stringr::str_detect(type, "^note_o[nf]f?$")) 

df_notes_wide <-
  df_notes %>% 
  dplyr::filter(stringr::str_detect(type, "^note_o[nf]f?$")) %>%
  # tab_measures(df_meta, df_notes, ticks_per_beat) %>%
  pivot_wide_notes() %>%
  left_join(pyramidi::midi_defs)
df_notes_wide
```
In the new format, the data has half the number of rows.
The columns `m`, `b`, `t`, `ticks`, `time` and `velocity` are each replaced by
two columns with the suffix `_note_on` and `_note_off`.


### Plot midi information in piano roll plot

Now we have the midi data in the right format for the piano roll plot:

```{r midi_piano_roll}
df_notes_wide %>%
  ggplot() +
  geom_segment(
    aes(
      x = m_note_on,
      y = note_name,
      xend = m_note_off,
      yend = note_name,
      color = velocity_note_on
    )
  ) +
  # each midi track is printed into its own facet:
  facet_wrap( ~ track,
              ncol = 1,
              scales = "free_y") + 
  guides(color=guide_colorbar(title="Note velocity")) +
  labs(
    title = "Piano roll of the note events in the midi file",
    subtitle = "Only notes played are shown."
  ) +
  xlab("Measures") +
  scale_x_continuous(breaks = seq(0, 16, 4),
                     minor_breaks = 0:16) +
  scale_colour_gradient() +
  theme_minimal()

```

### Manipulation of the midi data

The new format also allows to easily manipulate the midi data. For instance, let's put the volume (called `velocity` in midi) of the first beat in every bar to the maximum (127), and to half of its original value otherwise:


```{r}
df_notes_wide_mod <- df_notes_wide %>% 
  mutate(
    velocity_note_on = ifelse(
      # As it's a 4/4 beat, the first beat of each bar is a multiple of 4:
      b_note_on %% 4 == 0, 
      127, 
      velocity_note_on / 2
    )
)
```

Let's compare the modified value to the original one:

```{r}
df_notes_wide %>% 
  select(b_note_on, velocity_note_on) %>% 
  bind_cols(
    new = df_notes_wide_mod$velocity_note_on
  )
```

With an `ifelse()` statement, we modified the volume of the midi notes, depending on if they're the first beat in the measure or not. 

Other possible manipulations could be for instance:

* [Quantization](https://en.wikipedia.org/wiki/Quantization_(music)) by `round()`ing the `note_on`/`note_off` times,
* Chord generation, e.g. by applying a `group_by(floor(m_note_on))`-`summarize()` logic, or
* Arpeggiating chords by a `group_by(floor(m_note_on))` - `mutate()` logic.

### Pivot note data frame back to long format

We can transform the wide midi data back to the long format:

```{r pivot_long}
df_notes_long <- pivot_long_notes(df_notes_wide)
```


### Join non note events

We can now add the non note events:

```{r join_non_note_events}
df_midi_out <- merge_midi_frames(df_meta, df_notes_long, df_not_notes)

df_midi_out
```

The `time` value in midi format is given by the number of `ticks` passed between events.

### Write midi dataframe back to a midi file

Now we can transform the data back to a dataframe of the same format as the one we got with `miditapyr$frame_midi()`:
```{r dfc2}
dfc2 <-
  df_midi_out %>%
  # When reticulate converts R dataframes to pandas, there are complications
  # with character columns containing missing values.
  # repair_reticulate_conversion = TRUE, repairs that in the miditapyr python
  # code:
  miditapyr$nest_midi(repair_reticulate_conversion = TRUE)
as_tibble(dfc2)
```

And we can save it back to a midi file:

```{r write_midi, eval=FALSE}
miditapyr$write_midi(dfc2, ticks_per_beat, "test.mid")
```