# [Run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding)

* RLE (Run Length Encoding) is a technique that removes the data
  redundancy produced by the repetition of symbols. Example:
  ```
  aaaaa = 5a
  ```
* Depending on the length of the source alphabet and the maximal length of the run,
  different versions of RLE codecs have been proposed.

## 1. $N$-ary run length encoding

RLE for $N$-ary alphabets (alphabets of size $N$), where typically, $N=256$.

### Encoder

1. While there are symbols to encode:
    1. Let $s$ the next symbol.
    2. Read the next $n$ consecutive symbols equal to $s$.
    3. Write the pair $ns$.

### Example

Runs:
```
aaaabbbbbaaaaaabbbbbbbcccccc
```
are encoded as:
```
4a5b6a7b6c
```

### Decoder

1. While there are $ns$ pairs to decode:
    1. Write $n$-times the symbol $s$.

### Lab

In [1]:
# https://rosettacode.org/wiki/Run-length_encoding#Python
# https://docs.python.org/3/library/itertools.html#itertools.groupby

from itertools import groupby
def N_RLE_encode(input_string):
    return [(len(list(g)), k) for k,g in groupby(input_string)]
 
def N_RLE_decode(lst):
    return ''.join(c * n for n, c in lst)

x = N_RLE_encode('aaaabbbbbaaaaaabbbbbbbcccccc')
print(x)
y = N_RLE_decode(x)
print(y)

[(4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (6, 'c')]
aaaabbbbbaaaaaabbbbbbbcccccc


## 2. Binary run length encoding

* In binary RLE is not necessary to indicate the next symbol
  (only the length) because when a run ends, only the other (possible) symbol will
  start with the next run.

### Encoder

1. Let $s\leftarrow$ `0`.
2. While there are bits to encode:
    1. Read the next $n$ consecutive bits equal to $s$.
    2. Write $n$.
    3. $s\leftarrow (s+1)~\text{modulus}~2$.

### Example

Runs:
```
0000111110000001111111000000
```
are encoded as::
```
4 5 6 7 6
```

### Decoder

1. Let $s\leftarrow$ `0`.
2. While there are items $n$ to decode:
    1. Write $n$ bits equal to $s$.
    2. $s\leftarrow (s+1)~\text{modulus}~2$.

## 3. [MPN-5 run length encoding](https://en.wikipedia.org/wiki/Microcom_Networking_Protocol#MNP_5)

* Created by [Microcom Inc.](https://en.wikipedia.org/wiki/Microcom_Networking_Protocol)
for the [MNP (Microcom Networking Protocol) 5](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=held+data+compression+techniques+applications&btnG=).

### Codec
  
```
Input     Output
--------- ---------
ab        ab
aab       aab
aaab      aaa0b /* 3-symbols length run == <ESC> code-word */
aaaab     aaa1b
aaaaab    aaa2b
:         :
a^nb      aaa(n-3)b
```

### Example

Runs:
```
aaaabbbbbaaaaaabbbbbbbccccccb
```
are encoded as:
```
aaa1bbb2aaa3bbb4ccc3b
```

### Lab

In [5]:
# TO-DO
from itertools import groupby    #I import the library to use the groupby function
def MPN5_enconding(input_string):#I define the method as MPN5_encoding
    code_stream = '' #The string that I will return when it is encoded
    for k, g in groupby(input_string): #The string I want to encode is separated in different runs
        length_eachRun = len(list(g)) #Length of run
        if length_eachRun > 3: #If lenght of run is equal or greater than 3, I treat it
            code_stream+= str(k*3) + str((length_eachRun - 3)) #Calculate this: aaa(N-3) being N the lenght of run
        elif length_eachRun == 3: 
            code_stream+= k*length_eachRun + '0'
        else: #If its length is less than 3, I leave it as it is, I do not treat it
            code_stream+= k*length_eachRun
        #print(k, code_stream)
    return code_stream #I return the already coded string

def MPN5_decoding(input_string): #I define the method as MPN5_decoding by passing it a code-stream encoded by parameter
    original_code = '' #The decoded string that I will return
    for i in range(len(input_string)): #I go through each code-stream symbol
        #print(i, input_string[i])
        if input_string[i].isdigit(): #If it is a number, I will take the symbol that is repeated and write it the n times it repeats
            run = input_string[i-1]*int(input_string[i])
            original_code+= run
        else:
            original_code+= input_string[i] #If it is not a number, I write the symbol
    return original_code #I return the original code-stream (not coded)

message_encoded = MPN5_enconding('aaaabbbbbaaaaaabbbbbbbccccccaaab')
print(message_encoded)
message_decoded = MPN5_decoding(message_encoded)
print(message_decoded)

'aaaabbbbbaaaaaabbbbbbbccccccaaab' == message_decoded


aaa1bbb2aaa3bbb4ccc3aaa0b
aaaabbbbbaaaaaabbbbbbbccccccaaab


True

## Contents

1. [Burrows-Wheeler Transform](Burrows-Wheeler_Transform/Burrows-Wheeler_Transform.ipynb).