# Bob Ross Paintings

## Questions to answer

- Actual cabin chance
- Color prevalence/distribution by various metrics
- Cabin Detection


TODO:  
- Setup youtube-dl
- batch download episodes in lowest quality, with transcript files (.srt)
- rename files? 01_01, etc.
- Generate a youtube ID list of each episode

## Data Structuring

In [3]:
import numpy as np
import pandas as pd

from youtube_transcript_api import YouTubeTranscriptApi

pd.options.display.max_columns = None
pd.options.display.max_rows = None

In [4]:
# Read in data from external source
data = pd.read_csv('../data/bob_ross_paintings.csv')
df_0 = pd.DataFrame(data)
print(f'Shape: {df_0.shape}')
df_0.head()

Shape: (403, 28)


Unnamed: 0.1,Unnamed: 0,painting_index,img_src,painting_title,season,episode,num_colors,youtube_src,colors,color_hex,Black_Gesso,Bright_Red,Burnt_Umber,Cadmium_Yellow,Dark_Sienna,Indian_Red,Indian_Yellow,Liquid_Black,Liquid_Clear,Midnight_Black,Phthalo_Blue,Phthalo_Green,Prussian_Blue,Sap_Green,Titanium_White,Van_Dyke_Brown,Yellow_Ochre,Alizarin_Crimson
0,1,282,https://www.twoinchbrush.com/images/painting28...,A Walk in the Woods,1,1,8,https://www.youtube.com/embed/oh5p5f5_-7A,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1
1,2,283,https://www.twoinchbrush.com/images/painting28...,Mt. McKinley,1,2,8,https://www.youtube.com/embed/RInDWhYceLU,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1
2,3,284,https://www.twoinchbrush.com/images/painting28...,Ebony Sunset,1,3,9,https://www.youtube.com/embed/UOziR7PoVco,"['Alizarin Crimson', 'Black Gesso', 'Bright Re...","['#4E1500', '#000000', '#DB0000', '#FFEC00', '...",1,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1
3,4,285,https://www.twoinchbrush.com/images/painting28...,Winter Mist,1,4,3,https://www.youtube.com/embed/0pwoixRikn4,"['Prussian Blue', 'Titanium White', 'Van Dyke ...","['#021E44', '#FFFFFF', '#221B15']",0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0
4,5,286,https://www.twoinchbrush.com/images/painting28...,Quiet Stream,1,5,8,https://www.youtube.com/embed/DFSIQNjKRfk,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1


Episode Air Date data was not originally included in the dataset. I've created a .csv file with the air dates matched via index, sourced from Wikipedia.  
The new data will be joined onto he original dataframe, df_0.  


In [5]:
episode_information = pd.read_csv('../data/bob_ross_air_dates.csv')
episode_information_df = pd.DataFrame(episode_information)
print(f'Shape: {episode_information_df.shape}')
episode_information_df.head()

Shape: (403, 1)


Unnamed: 0,ep_info
0,"""A Walk in the Woods"" (January 11, 1983)"
1,"""Mount McKinley"" (January 11, 1983)"
2,"""Ebony Sunset"" (January 18, 1983)"
3,"""Winter Mist"" (January 25, 1983)"
4,"""Quiet Stream"" (February 1, 1983)"


In [6]:
# Join airdate_df onto original df, df_0 via index
df_1= df_0.join(episode_information_df, how='left')

# Data Cleaning & Engineering

In [7]:
def start_pipeline(dataf):
    '''Makes a copy of input dataframe to transform throughout pipeline'''
    return dataf.copy()

def get_videoID(dataf):
    '''Creates a column in the dataframe that is the episodes videoID on YouTube'''
    dataf['videoID'] = dataf['youtube_src'].str.slice(30)
    return dataf

def extract_date(dataf):
    '''Extracts and returns only airdate from episode info'''
    dataf['air_date'] = dataf['ep_info'].str.extract(r'\((.*?)\)')
    return dataf

def encode_date(dataf):
    '''Converts air_date to datetime type'''
    return dataf

def abs_episode_count(dataf):
    '''Creates a column that is absolute episode count'''
    dataf = dataf.rename(columns={'Unnamed: 0':'abs_episode'})
    return dataf


Execute pipeline

In [8]:
df = (df_1
     .pipe(start_pipeline)
     .pipe(get_videoID)
     .pipe(extract_date)
     .pipe(abs_episode_count)
     )

In [9]:
df.head()

Unnamed: 0,abs_episode,painting_index,img_src,painting_title,season,episode,num_colors,youtube_src,colors,color_hex,Black_Gesso,Bright_Red,Burnt_Umber,Cadmium_Yellow,Dark_Sienna,Indian_Red,Indian_Yellow,Liquid_Black,Liquid_Clear,Midnight_Black,Phthalo_Blue,Phthalo_Green,Prussian_Blue,Sap_Green,Titanium_White,Van_Dyke_Brown,Yellow_Ochre,Alizarin_Crimson,ep_info,videoID,air_date
0,1,282,https://www.twoinchbrush.com/images/painting28...,A Walk in the Woods,1,1,8,https://www.youtube.com/embed/oh5p5f5_-7A,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,"""A Walk in the Woods"" (January 11, 1983)",oh5p5f5_-7A,"January 11, 1983"
1,2,283,https://www.twoinchbrush.com/images/painting28...,Mt. McKinley,1,2,8,https://www.youtube.com/embed/RInDWhYceLU,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,"""Mount McKinley"" (January 11, 1983)",RInDWhYceLU,"January 11, 1983"
2,3,284,https://www.twoinchbrush.com/images/painting28...,Ebony Sunset,1,3,9,https://www.youtube.com/embed/UOziR7PoVco,"['Alizarin Crimson', 'Black Gesso', 'Bright Re...","['#4E1500', '#000000', '#DB0000', '#FFEC00', '...",1,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,"""Ebony Sunset"" (January 18, 1983)",UOziR7PoVco,"January 18, 1983"
3,4,285,https://www.twoinchbrush.com/images/painting28...,Winter Mist,1,4,3,https://www.youtube.com/embed/0pwoixRikn4,"['Prussian Blue', 'Titanium White', 'Van Dyke ...","['#021E44', '#FFFFFF', '#221B15']",0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,"""Winter Mist"" (January 25, 1983)",0pwoixRikn4,"January 25, 1983"
4,5,286,https://www.twoinchbrush.com/images/painting28...,Quiet Stream,1,5,8,https://www.youtube.com/embed/DFSIQNjKRfk,"['Alizarin Crimson', 'Bright Red', 'Cadmium Ye...","['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '...",0,1,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,1,"""Quiet Stream"" (February 1, 1983)",DFSIQNjKRfk,"February 1, 1983"


## Obtain Transcripts of Episodes

In [10]:
def downloadTranscript(videoID):
    '''Takes in a YT VideoID and returns the transcript object'''
    transcript_object = YouTubeTranscriptApi.get_transcript(videoID)
    return transcript_object


def generateTranscriptText(transcript_object):
    '''Takes in a transcript object and returns a string of the concatenated text'''
    line_count = 0
    transcript_text = ''
    for line in transcript_object:
        transcript_text += (transcript_object[line_count]['text'] + " ")
        line_count += 1
    return transcript_text


def transcript_to_file(videoID, transcript_text):
    '''Takes in a transcript text string and writes it to a file with name videoID.txt'''
    with open(f'../transcripts/{videoID}.txt', 'w') as f:
        f.write(transcript_text)
        

def video_to_file(videoID):
    '''Runs suite of transcript management functions'''
    print('----------')
    # Get transcript object for a video
    transcript_object = downloadTranscript(videoID)
    # Generate a text string for transcript
    transcript_text = generateTranscriptText(transcript_object)
    # Write transcript text to file with name <videoID>.txt
    transcript_to_file(videoID, transcript_text)
    print(f'Video ID: {videoID} has been written to file {videoID}.txt')


# Get transcripts for all videos
def multipleTranscripts(videoIDs):
    '''Takes in a list of YT video IDs and writes transcripts to a list'''
    for videoID in videoIDs:
        try:
            video_to_file(videoID)
        except:
            print(f'No transcript found for videoID - {videoID}')
            with open(f'../transcripts/ERROR - {videoID}.txt', 'w') as f:
                f.write(f'''No transcript found for {videoID}.
                        See video at:
                        https://www.youtube.com/watch/{videoID}''')

In [11]:
# Create a list of videoIDs.
def get_videoIDs(dataf):
    '''Takes in a dataframe and returns a list of videoIDs'''
    videoIDs = dataf['videoID'].tolist()
    return videoIDs

### Transcripts Script  

Use the code below to replicate the transcript download.  
This has been converted to a markdown cell and put in a code block to prevent accidentally running again.  
**TODO**: Move this to a `standalone.py` file.
```python  
videoIDs = get_videoIDs(df)  
multipleTranscripts(videoIDs)
```

# Transcript Errors  

6 videoIDs led to errors when the script tried to obtain transcripts.  
These are the IDs.  
```
WGUcw_kFvzU
U2_SKgM3f4A
Leiw-FtADZc
gOGJYHWjXgE
bgI3_1quJ18
1jRPshs27H8
```

In [15]:
# list of videos with no transcript
transcript_error_ID_list =['WGUcw_kFvzU', 
                           'U2_SKgM3f4A', 
                           'Leiw-FtADZc', 
                           'gOGJYHWjXgE', 
                           'bgI3_1quJ18', 
                           '1jRPshs27H8']

In [31]:
def transcriptErrorReport(error_list):
    '''Takes in a list of videoIDs with error during transcript process
    and returns information about them'''
    error_dicts = []
    for videoID in error_list:
        row = df.loc[df['videoID'] == videoID]
        error_dicts.append(row.to_dict('records')[0])
    
    for entry in error_dicts:
        print(f'''Episode Title: {entry['painting_title']}
        Season: {entry['season']}
        Episode: {entry['episode']}
        Youtube Link: {entry['youtube_src']}
        -----------------------------------''')

In [32]:
transcriptErrorReport(transcript_error_ID_list)

Episode Title: Secluded Beach
        Season: 9
        Episode: 6
        Youtube Link: https://www.youtube.com/embed/WGUcw_kFvzU
        -----------------------------------
Episode Title: Bubbling Mountain Brook
        Season: 8
        Episode: 6
        Youtube Link: https://www.youtube.com/embed/U2_SKgM3f4A
        -----------------------------------
Episode Title: Quiet Mountains River
        Season: 12
        Episode: 7
        Youtube Link: https://www.youtube.com/embed/Leiw-FtADZc
        -----------------------------------
Episode Title: Soft Mountain Glow
        Season: 12
        Episode: 11
        Youtube Link: https://www.youtube.com/embed/gOGJYHWjXgE
        -----------------------------------
Episode Title: Mountain Exhibition
        Season: 13
        Episode: 8
        Youtube Link: https://www.youtube.com/embed/bgI3_1quJ18
        -----------------------------------
Episode Title: Mountain in an Oval
        Season: 12
        Episode: 12
        Youtube Link: 