# Sci-Fi IRL #2: Technology Terminology Velocity

### A Data Storytelling Project by Tobias Reaper

### ---- Datalogue 007 ----

---
---

## Description

The goal for this project is to tell a story about the velocity of technology terminology. I came up with a list of tech terms that are either buzzwords or have been in the recent past, and I will use those as my keywords for querying the PushShift API. I will aggregate the number of comments (and submissions?) that contain the keywords, and explore the data across time and across different online communities (subreddits).

My hypothesis is that the rate at which new terminology (and by transferrence, the technology behind the terms) differs between subreddits. More specifically, I hope to explore the time lag between when a new technology is introduced and when it begins to "soak into popular discouse", so to speak.

> For example, how long does it take for a new technology to enter multimedia content such as science-fiction, movies, TV?


---

### 1. List of Key(buzz)words

- algorithms
- artificial intelligence
- augmented reality
- genetic engineering
- universal basic income
- quantum computing
- cryptocurrency
- facial recognition

### 2. List of Subreddits + # Subscribers

- Science / Technology
    - Futurology, 14.2m
    - technology, 8.2m
    - science, 22.4m
    - askscience, 18.1m
- Entertainment
    - books, 17.1m
    - scifi, 1.2m
    - movies, 21.5m
    - gaming, 23.7m
- Media / General
    - worldnews, 22.2m
    - news, 19m
    - politics, 5.4m
    - AskReddit, 24.6m

---
---

### Imports and Configuration

In [4]:
# Three Musketeers
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

In [5]:
# For using the API
import requests

In [7]:
# Set pandas display options to allow for more columns and rows
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 500)

---

### Functions

In [8]:
def subreddit_agg(query, subreddit, frequency="month", aggs="created_utc"):
    """
    Returns the JSON response of a PushShift API aggregate comment search as a Python dictionary.
    
    Note: if you're reading this note, that means that this function is still only written
    with the intention of automating a specific set of actions for a specific project.
    
    ---- Arguments ----
    query: (str) keyword to search.
    subreddit: (str) subreddit name
    frequency: (str) set the size of the time buckets.
    aggs: (str) aggregate function name. Default is "created_utc".
    (For more information, read the PushShift API Documentation.)
    -------------------
    """
    
    # Build the query url based on endpoints and parameters 
    url = f"https://api.pushshift.io/reddit/search/comment/?q={query}&subreddit={subreddit}&aggs={aggs}&frequency={frequency}"
    
    # Send the request and save the response into the response object
    response = requests.get(url)
    
    # Check the response; stop execution if failed
    assert response.status_code == 200
    
    # Parse the JSON into a Python dictionary
    # and return it for further processing
    return response.json()

In [9]:
def time_agg_df(data, keyword, frequency="month"):
    """
    Returns cleaned Pandas DataFrame of keyword frequency over time, given correctly-formatted Python dictionary.
    Renames the frequency column to keyword; converts month to datetime.
    
    Note: if you're reading this note, that means that this function is still only written
    with the intention of automating a specific set of actions for a specific project.
    
    ---- Arguments ----
    data: (dict) Python dictionary converted from JSON API response.
    keyword: (str) the keyword that was queried.
    time_bucket: (str) size of time buckets, which is also the name of the resulting DataFrame column. Defaults to "month".
    -------------------
    """
    
    # Convert the python object into a pandas dataframe
    df = pd.DataFrame(data["aggs"]["created_utc"])

    # Convert "key" into a datetime column
    df["key"] = pd.to_datetime(df["key"], unit="s", origin="unix")

    # Rename "key" to reflect the fact that it is the beginning of the time bucket
    df = df.rename(mapper={"key": frequency, "doc_count": keyword}, axis="columns")
    
    # Return the DataFrame
    return df

In [10]:
def data_df(data):
    """
    Returns Reddit comments in Pandas DataFrame, given the correctly-formatted Python dictionary.
    
    Note: if you're reading this note, that means that this function is still only written
    with the intention of automating a specific set of actions for a specific project.
    
    ---- Arguments ----
    data: (dict) Python dictionary converted from JSON API response.
    -------------------
    """
    
    # Convert the comments into a pandas dataframe
    df = pd.DataFrame(data["data"])

    # Return the DataFrame
    return df

In [11]:
def df_to_csv(data, filename):
    """
    Basically just a wrapper around the Pandas `.to_csv()` method,
    created to standardize the inputs and outputs.
    
    ---- Arguments ----
    data: (pd.DataFrame) Pandas DataFrame to be saved as a csv.
    filepath: (str) name or path of the file to be saved.
    -------------------
    """
    
    # Saves the DataFrame to csv
    data.to_csv(path_or_buf=filename)
    
    # And that's it, folks!

---

In [21]:
def reddit_data_setter(keywords, subreddits, csv=False, frequency="month", aggs="created_utc"):
    """
    Creates two DataFrames that hold combined data of all combinations of keywords / subreddits.
    
    Note: if you're reading this note, that means that this function is still only written
    with the intention of automating a specific set of actions for a specific project.
    
    ---- Arguments ----
    keywords: (list) keyword(s) to search.
    subreddits: (list) name of subreddit(s) to include.
    csv: (bool) if True, save the resulting dataframes as csv file.
    frequency: (str) set the size of the time buckets.
    aggs: (str) aggregate function name. Default is "created_utc".
    (For more information, read the PushShift API Documentation.)
    -------------------
    """
    from time import sleep

    comment_df_list = []  # Empty list to hold comment dataframes
    word_df_list = []  # Empty list to hold monthly word count dataframes
    df_comm = pd.DataFrame()  # Empty dataframe for comment data
    df_main = pd.DataFrame()  # Empty dataframe for keyword counts
    # Create the "month" (datetime) column - to be used when joining
    df_main["month"] = pd.date_range(start="2005-01-01", end="2019-09-01", freq="MS")
    
    # Run query for individual keywords on each subreddit.
    # Subreddit (outer) -> keyword (inner) = all keywords in one subreddit at a time
    for subreddit in subreddits:
        for word in keywords:
            # Create unique column name for each subreddit / word combo
            col_name = f"{subreddit}_{word.replace(' ', _)}"
            
            # Indicates current subreddit / keyword
            print(f"Starting {col_name}")
            sleep(0.5)  # Add sleep time to reduce API load 
            print("...")

            # Make request and convert response to dictionary
            dictionary = subreddit_agg(word, subreddit)

            # Append aggs word count df to word_df_list
            word_df_list.append(time_agg_df(dictionary, col_name))

            # Append comments df to comment_df_list
            comment_df_list.append(data_df(dictionary))
            
            print(f"Finished {col_name}")
            sleep(0.5)  # More sleep to reduce API load
            print()
            sleep(0.5)
    
    # Set "month" as index in order to concatenate list of dataframes
    df_main = pd.concat([df.set_index("month") for df in word_df_list],
                        axis=1, join="outer").reset_index()
    
    # Concatenate comment_df_list dataframes
    df_comm = pd.concat(comment_df_list, axis=0, sort=False,
                        join="outer", ignore_index=True)
        
    # If csv parameter is set to True, save datasets to filesystem as csv
    if csv:
        df_to_csv(df_main, f"monthly-{'_'.join(subreddits[0])}-{'_'.join(keywords[0])}.csv")
        df_to_csv(df_comm, f"comments-{'_'.join(subreddits[0])}-{'_'.join(keywords[0])}.csv")
    
    # Return df_main, df_comm, respectively
    return df_main, df_comm

---

In [22]:
# Define keywords and subreddits as python lists
words = [
    "algorithms",
    "artificial intelligence",
    "augmented reality",
    "genetic engineering",
    "universal basic income",
    "quantum computing",
    "cryptocurrency",
    "facial recognition",
]

subs = [
    "Futurology",
    "technology",
    "science",
    "askscience",
    "books",
    "scifi",
    "movies",
    "gaming",
    "worldnews",
    "news",
    "politics",
    "AskReddit",
]

In [None]:
# Run the function to create and savethe dataset
df_main, df_comm = reddit_data_setter(words, subs, True)

In [49]:
# Take a look to be sure it worked as expected
print(df_main.shape)
df_main.head()

(155, 97)


Unnamed: 0,month,Futurology_algorithms,Futurology_artificialintelligence,Futurology_augmentedreality,Futurology_geneticengineering,Futurology_universalbasicincome,Futurology_quantumcomputing,Futurology_cryptocurrency,Futurology_facialrecognition,technology_algorithms,technology_artificialintelligence,technology_augmentedreality,technology_geneticengineering,technology_universalbasicincome,technology_quantumcomputing,technology_cryptocurrency,technology_facialrecognition,science_algorithms,science_artificialintelligence,science_augmentedreality,science_geneticengineering,science_universalbasicincome,science_quantumcomputing,science_cryptocurrency,science_facialrecognition,askscience_algorithms,askscience_artificialintelligence,askscience_augmentedreality,askscience_geneticengineering,askscience_universalbasicincome,askscience_quantumcomputing,askscience_cryptocurrency,askscience_facialrecognition,books_algorithms,books_artificialintelligence,books_augmentedreality,books_geneticengineering,books_universalbasicincome,books_quantumcomputing,books_cryptocurrency,books_facialrecognition,scifi_algorithms,scifi_artificialintelligence,scifi_augmentedreality,scifi_geneticengineering,scifi_universalbasicincome,scifi_quantumcomputing,scifi_cryptocurrency,scifi_facialrecognition,movies_algorithms,movies_artificialintelligence,movies_augmentedreality,movies_geneticengineering,movies_universalbasicincome,movies_quantumcomputing,movies_cryptocurrency,movies_facialrecognition,gaming_algorithms,gaming_artificialintelligence,gaming_augmentedreality,gaming_geneticengineering,gaming_universalbasicincome,gaming_quantumcomputing,gaming_cryptocurrency,gaming_facialrecognition,worldnews_algorithms,worldnews_artificialintelligence,worldnews_augmentedreality,worldnews_geneticengineering,worldnews_universalbasicincome,worldnews_quantumcomputing,worldnews_cryptocurrency,worldnews_facialrecognition,news_algorithms,news_artificialintelligence,news_augmentedreality,news_geneticengineering,news_universalbasicincome,news_quantumcomputing,news_cryptocurrency,news_facialrecognition,politics_algorithms,politics_artificialintelligence,politics_augmentedreality,politics_geneticengineering,politics_universalbasicincome,politics_quantumcomputing,politics_cryptocurrency,politics_facialrecognition,AskReddit_algorithms,AskReddit_artificialintelligence,AskReddit_augmentedreality,AskReddit_geneticengineering,AskReddit_universalbasicincome,AskReddit_quantumcomputing,AskReddit_cryptocurrency,AskReddit_facialrecognition
0,2006-11-01,,,,,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2006-12-01,,,,,,,,,,,,,,,,,,,,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,2007-01-01,,,,,,,,,,,,,,,,,,1.0,,0,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,2007-02-01,,,,,,,,,,,,,,,,,2.0,0.0,,2,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,2007-03-01,,,,,,,,,,,,,,,,,5.0,4.0,,2,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


---

## Visualizations

In [132]:
# More advanced vizualizations with Bokeh
from bokeh.plotting import figure, output_file, output_notebook, show
from bokeh.layouts import column
from bokeh.models import ColumnDataSource, FixedTicker, DatetimeTickFormatter
from bokeh.models.glyphs import Patches

In [133]:
# More imports
import colorcet as cc

---

### Term Velocity: Algorithms

I want to get each word on a separate graph. That means I'll have to filter based on the column name.

I'll start with algorithms...

In [87]:
# List of column names to include in new dataframe
cols_algo = [
    "month",
    "Futurology_algorithms",
    "technology_algorithms",
    "science_algorithms",
    "askscience_algorithms",
    "books_algorithms",
    "scifi_algorithms",
    "movies_algorithms",
    "gaming_algorithms",
    "worldnews_algorithms",
    "news_algorithms",
    "politics_algorithms",
    "AskReddit_algorithms",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_algorithms = df_main[cols_algo].fillna(value=0)
# df_algorithms = df_main[cols_algo].fillna(value=0).set_index("month")

df_algorithms.head()

Unnamed: 0,month,Futurology_algorithms,technology_algorithms,science_algorithms,askscience_algorithms,books_algorithms,scifi_algorithms,movies_algorithms,gaming_algorithms,worldnews_algorithms,news_algorithms,politics_algorithms,AskReddit_algorithms
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [173]:
# Define palette
palette = [cc.bkr[i*15] for i in range(17)]
palette

['#1881fa',
 '#2774dd',
 '#2e67c0',
 '#315aa4',
 '#314e89',
 '#2f426f',
 '#2b3656',
 '#262b3e',
 '#212128',
 '#28201e',
 '#3d2622',
 '#542d26',
 '#6b332b',
 '#82392f',
 '#9a3f34',
 '#b34538',
 '#cc4a3c']

In [174]:
# Colors
subs_colors = {
    "Futurology": f"{palette[0]}",
    "technology": f"{palette[1]}",
    "science": f"{palette[2]}",
    "askscience": f"{palette[3]}",
    "books": f"{palette[4]}",
    "scifi": f"{palette[5]}",
    "movies": f"{palette[6]}",
    "gaming": f"{palette[7]}",
    "worldnews": f"{palette[8]}",
    "news": f"{palette[9]}",
    "politics": f"{palette[10]}",
    "AskReddit": f"{palette[11]}",
}

In [179]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'algorithms' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_algorithms"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])
    
    # Y Ticks

# Show the results
show(column(p_names))

---

### Term Velocity: Artificial Intelligence

In [194]:
# List of column names to include in new dataframe
cols_ai = [
    "month",
    "Futurology_artificialintelligence",
    "technology_artificialintelligence",
    "science_artificialintelligence",
    "askscience_artificialintelligence",
    "books_artificialintelligence",
    "scifi_artificialintelligence",
    "movies_artificialintelligence",
    "gaming_artificialintelligence",
    "worldnews_artificialintelligence",
    "news_artificialintelligence",
    "politics_artificialintelligence",
    "AskReddit_artificialintelligence",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_ai = df_main[cols_ai].fillna(value=0)

df_ai.head()

Unnamed: 0,month,Futurology_artificialintelligence,technology_artificialintelligence,science_artificialintelligence,askscience_artificialintelligence,books_artificialintelligence,scifi_artificialintelligence,movies_artificialintelligence,gaming_artificialintelligence,worldnews_artificialintelligence,news_artificialintelligence,politics_artificialintelligence,AskReddit_artificialintelligence
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [206]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'artificial intelligence' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_artificialintelligence"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))

---

### Term Velocity: Augmented Reality

In [191]:
# List of column names to include in new dataframe
cols_ar = [
    "month",
    "Futurology_augmentedreality",
    "technology_augmentedreality",
    "science_augmentedreality",
    "askscience_augmentedreality",
    "books_augmentedreality",
    "scifi_augmentedreality",
    "movies_augmentedreality",
    "gaming_augmentedreality",
    "worldnews_augmentedreality",
    "news_augmentedreality",
    "politics_augmentedreality",
    "AskReddit_augmentedreality",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_ar = df_main[cols_ar].fillna(value=0)

df_ar.head()

Unnamed: 0,month,Futurology_augmentedreality,technology_augmentedreality,science_augmentedreality,askscience_augmentedreality,books_augmentedreality,scifi_augmentedreality,movies_augmentedreality,gaming_augmentedreality,worldnews_augmentedreality,news_augmentedreality,politics_augmentedreality,AskReddit_augmentedreality
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [192]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'augmented reality' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_augmentedreality"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))

---

### Term Velocity: Genetic Engineering

In [196]:
# List of column names to include in new dataframe
cols_ge = [
    "month",
    "Futurology_geneticengineering",
    "technology_geneticengineering",
    "science_geneticengineering",
    "askscience_geneticengineering",
    "books_geneticengineering",
    "scifi_geneticengineering",
    "movies_geneticengineering",
    "gaming_geneticengineering",
    "worldnews_geneticengineering",
    "news_geneticengineering",
    "politics_geneticengineering",
    "AskReddit_geneticengineering",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_ge = df_main[cols_ge].fillna(value=0)

df_ge.head()

Unnamed: 0,month,Futurology_geneticengineering,technology_geneticengineering,science_geneticengineering,askscience_geneticengineering,books_geneticengineering,scifi_geneticengineering,movies_geneticengineering,gaming_geneticengineering,worldnews_geneticengineering,news_geneticengineering,politics_geneticengineering,AskReddit_geneticengineering
0,2006-11-01,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [197]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'genetic engineering' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_geneticengineering"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))

---

### Term Velocity: Universal Basic Income

In [200]:
# List of column names to include in new dataframe
cols_ubi = [
    "month",
    "Futurology_universalbasicincome",
    "technology_universalbasicincome",
    "science_universalbasicincome",
    "askscience_universalbasicincome",
    "books_universalbasicincome",
    "scifi_universalbasicincome",
    "movies_universalbasicincome",
    "gaming_universalbasicincome",
    "worldnews_universalbasicincome",
    "news_universalbasicincome",
    "politics_universalbasicincome",
    "AskReddit_universalbasicincome",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_ubi = df_main[cols_ubi].fillna(value=0)

df_ubi.head()

Unnamed: 0,month,Futurology_universalbasicincome,technology_universalbasicincome,science_universalbasicincome,askscience_universalbasicincome,books_universalbasicincome,scifi_universalbasicincome,movies_universalbasicincome,gaming_universalbasicincome,worldnews_universalbasicincome,news_universalbasicincome,politics_universalbasicincome,AskReddit_universalbasicincome
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [201]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'universal basic income' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_universalbasicincome"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))

---

### Term Velocity: Quantum Computing

In [199]:
# List of column names to include in new dataframe
cols_qc = [
    "month",
    "Futurology_quantumcomputing",
    "technology_quantumcomputing",
    "science_quantumcomputing",
    "askscience_quantumcomputing",
    "books_quantumcomputing",
    "scifi_quantumcomputing",
    "movies_quantumcomputing",
    "gaming_quantumcomputing",
    "worldnews_quantumcomputing",
    "news_quantumcomputing",
    "politics_quantumcomputing",
    "AskReddit_quantumcomputing",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_qc = df_main[cols_qc].fillna(value=0)

df_qc.head()

Unnamed: 0,month,Futurology_quantumcomputing,technology_quantumcomputing,science_quantumcomputing,askscience_quantumcomputing,books_quantumcomputing,scifi_quantumcomputing,movies_quantumcomputing,gaming_quantumcomputing,worldnews_quantumcomputing,news_quantumcomputing,politics_quantumcomputing,AskReddit_quantumcomputing
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [203]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'quantum computing' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_quantumcomputing"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))

---

### Term Velocity: Facial Recognition

In [204]:
# List of column names to include in new dataframe
cols_fr = [
    "month",
    "Futurology_facialrecognition",
    "technology_facialrecognition",
    "science_facialrecognition",
    "askscience_facialrecognition",
    "books_facialrecognition",
    "scifi_facialrecognition",
    "movies_facialrecognition",
    "gaming_facialrecognition",
    "worldnews_facialrecognition",
    "news_facialrecognition",
    "politics_facialrecognition",
    "AskReddit_facialrecognition",
]

# Create new dataframe with NaN values filled in with 0, and the index set to the month column
df_fr = df_main[cols_fr].fillna(value=0)

df_fr.head()

Unnamed: 0,month,Futurology_facialrecognition,technology_facialrecognition,science_facialrecognition,askscience_facialrecognition,books_facialrecognition,scifi_facialrecognition,movies_facialrecognition,gaming_facialrecognition,worldnews_facialrecognition,news_facialrecognition,politics_facialrecognition,AskReddit_facialrecognition
0,2006-11-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2006-12-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2007-01-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2007-02-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2007-03-01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [205]:
# Output to current notebook
output_notebook()

p = {}  # dict to hold plots
p_names = []  # list for plot names

for sub in subs_colors:
    p[f"{sub}"] = figure(title=f"Comments that mention 'facial recognition' in r/{sub}",
                         plot_width=1000, plot_height=200, 
                         x_axis_type="datetime", x_range=(df_algorithms.iloc[50][0], df_algorithms.iloc[-9][0]))
    p[f"{sub}"].line(df_main["month"], df_main[f"{sub}_facialrecognition"], line_width=2, line_color=f"{subs_colors[sub]}")
    p_names.append(p[f"{sub}"])

# Show the results
show(column(p_names))