# Test 1: Minutes spent buffering for streaming videos in parallel

In [1]:
import requests
import sys
import time
import numpy as np
import ipywidgets as widgets
from multiprocessing import Process, Queue
from queue import Empty

## Analysis

We'll look at the [Big Buck Bunny](https://peach.blender.org/) film, which is about 10m 30s long, resolution 1920x1080, at 30 frames per second. I chose to host it at BYU because it's close to me:

In [2]:
test_url = "https://students.cs.byu.edu/~th443/bbb.mp4"

Download the video once:

In [3]:
# !wget 'https://students.cs.byu.edu/~th443/bbb.mp4'

We'll determine how many individual frames are in this video.

Nice one-liner from https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg:

In [4]:
# !ffprobe -v error -select_streams v:0 -show_entries stream=nb_frames -of default=nokey=1:noprint_wrappers=1 -ignore_editlist 1 bbb.mp4

19036


In [5]:
frame_count = 19036

In [6]:
fps = 30

In [7]:
minutes = frame_count / fps / 60
minutes

10.575555555555555

We'll find out how many bytes are in our version of Big Buck Bunny:

In [8]:
!wc -c bbb.mp4

276134947 bbb.mp4


`mb_count` represents the size of our video in megabytes (MB). A megabyte contains 1000000 (1E+6) bytes:

In [9]:
mb_count = 276134947 / (1e+6)

`frame_size` represents the average MB size of a frame of our video:

In [10]:
frame_size = mb_count / frame_count
frame_size

0.01450593333683547

`second_size` represents the average MB size of a second (MB/s) at 30 FPS--the minimum MB/s required for smooth playback:

In [11]:
second_size = frame_size * fps
second_size

0.4351780001050641

Internet speed is usually measured in _megabits_ per second (Mb/s). Note that megabyte is shortened to 'MB,' but megabit is written as 'Mb'. A megabyte is 8 megabits:

In [12]:
second_megabit_size = second_size * 8
second_megabit_size

3.4814240008405126

**Internet speed is like a pipe.**

In an ideal world, a file that is 10 megabytes--80 megabits--would take 8 seconds to load on a 10 Mb/s network. So if every second of a video is 5 megabits, it should be able to "fit" through a 5 Mb/s network connection every second without buffering.

In [13]:
process_count = 3

**What happens when we try to shove three videos down one network pipe?**

Our test video is around 3.5 Mb/s, and our test network speed _should_ be 5 Mb/s. Streaming three videos at the same time is just like streaming one big video with all of their sizes combined:

In [14]:
combined_second_megabit_size = round(second_megabit_size * process_count, 1)
combined_second_megabit_size

10.4

With a combined size of around 10.4 Mb/s and a network "pipe size" of 5 Mb/s, **each of our videos should take around twice as long to download**.

What does this mean for streaming? If our 12 minute video takes 24 minutes to download, **it will spend around 12 minutes buffering**. Let's put this to the test.

## Testing

In [15]:
mbps_queue = Queue()
mbps_percent_queue = Queue()

def download_measure(i):
    global mbps_list
    response = requests.get(test_url, stream=True)
    total_length = response.headers.get('content-length')

    start = time.time()
    dl = 0
    total_length = int(total_length)
    last_print_time = 0
    for data in response.iter_content(chunk_size=1024):
        dl += len(data)
        done = int(50 * dl / total_length)
        dl_mb = dl/(1e+6)
        if time.time() - last_print_time > 0.2:
            last_print_time = time.time()
            mbps_percent_queue.put((i, dl/total_length))
        
    elapsed = time.time() - start
    mbps_queue.put((total_length/1e+6)/elapsed)

In [16]:
processes = [Process(target=download_measure, args=(i,)) for i in range(process_count)]

In [17]:
progress_bars = [widgets.FloatProgress(
                        value=0,
                        min=0,
                        max=1,
                        step=0.1,
                        description=f'{i} (0%):',
                        bar_style='info',
                        orientation='horizontal'
                     )
                    for i in range(process_count)
                ]

In [18]:
def print_progress(): 
    try:
        progress_outputs = sorted([mbps_percent_queue.get(timeout=.2) for _ in processes], key=lambda a: a[0] if a else 0)
        for p in filter(lambda p: p, progress_outputs):
            progress_bars[p[0]].value = p[1]
            progress_bars[p[0]].description = f"{p[0]} ({'{:.2f}'.format(round(p[1] * 10000)/100)}%)"
    except Empty:
        pass

In [19]:
for p in processes:
    p.start()

[display(b) for b in progress_bars]
    
while True in [p.is_alive() for p in processes]:
    print_progress()

FloatProgress(value=0.0, bar_style='info', description='0 (0%):', max=1.0)

FloatProgress(value=0.0, bar_style='info', description='1 (0%):', max=1.0)

FloatProgress(value=0.0, bar_style='info', description='2 (0%):', max=1.0)

In [20]:
mbps_averages = [mbps_queue.get() for p in processes]
mbps_averages

[0.13598948444100742, 0.13490805119851562, 0.1334723657972683]

In [21]:
mbps_average = np.average(mbps_averages)
mbps_average

0.1347899671455971

In [22]:
relative_length_ratio = second_size / mbps_average
relative_length_ratio

3.228563737499799

In [23]:
extra_length_proportion = relative_length_ratio - 1
extra_length_proportion

2.228563737499799

In [24]:
minutes_spent_buffering = round(extra_length_proportion * minutes, 1)
minutes_spent_buffering

23.6

## Conclusion

We got ~24 minutes of buffering for a 12 minute video. _If this is accurate,_ a viewer of any one of the three videos would spend, on average, just about twice as much time as the length of the original video just waiting for it to load.

**Consider paying for internet.**

### Potential improvements:

- Find multiple different video sources with different latencies, throughputs, and bitrates
- Figure out how to do this test continuously