# A4 Final Project

## Abstract

The following project matches Twitch streamer data with Google Trends search inquiries to understand what factors contribute to the frequency of searches for a Twitch channel, or a streamer's "alias." Using the [Top Streamers on Twitch](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata) and [Google Trends](https://trends.google.com/trends/?geo=US) datasets, this project first attempts to understand what data is provided and how to access the given data. Specifically, with the Google Trends dataset, an interest score is a value between 0 and 100 representing how frequently a keyword is searched within a week where 0 indicates no searches and 100 represents an extreme frequency of searches. The total cumulative interest score of ten streamer aliases are independently summed up to represent how frequently their alias was searched throughout the year 2020. This summed interest score is then compared to streamer insights including their follower base, viewership, total watch time by viewers, and total stream time. From the data, there is a weak negative correlation between a streamer's popularity and their interest score, meaning that the more popular a streamer is, the less interest score they have. In other words, the more followers or viewers a streamer has, the less often their alias is searched. However, higher interest scores do not represent more growth in streamers as there was no correlation to be found between interest scores and follower or viewer growth. In conclusion, frequency of searches and interest score do not increase with any factors of a streamer, nor do they influence the growth of a streamer, but a streamer's existing background does influence their interest score.

## Motivation and problem statement

I am planning on analyzing Twitch data and Google Trends to understand how users discover content and remain on a platform. I am interested in examining a synchronous platform such as Twitch and view how individuals find themselves engaged, specifically by continuing to view or return to a streamer. From a human-centered perspective, it will be interesting to attempt to learn what how our patterns vary in searches and watchtime, if determinable, though this may be difficult. From a scientific perspective, it may be useful to compare effort (in stream time) with results (of watch time) and gather insights around this. I'm hoping to learn more about how we discover synchronous platforms (exact search terms versus general), in this case a specific streamer's channel, and how engaged we stay on these platforms.

Originally I planned to only examine one dataset, the Top Streamers on Twitch, but found that my goals and end result may be rather limited in this regard. My idea was not scoped much as I simply considered a potential dataset to explore. In speaking to a peer, I was pushed to consider the reasoning behind this data generally or how I might use this data to represent something, such as the significance viewership or watch time rather than simply comparing the Twitch data to itself. I later aimed to tie with another dataset which became the Google Trends data.

## Data selected for analysis

With the [Top Streamers on Twitch](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata) and [Google Trends](https://trends.google.com/trends/?geo=US) data, I believe I have a strong set of large data to understand search patterns with watch times and viewer count to discern possible patterns. While there are many external factors in maintaining viewership, I believe there may be some correlation with search patterns and discovering, rediscovering, or simply routing to certain platforms. 

[Top Streamers on Twitch](https://www.kaggle.com/datasets/aayushmishra1512/twitchdata) contains a Twitch channel name, watch time, stream time, peak viewers, average viewers, followers, followers gained, views gained, whether a channel is partnered with Twitch, and if a channel is content-restricted being mature for those 18+. This data is scraped and represents this information over one year. This dataset is under the CC0: Public Domain license. The data can be used to draw connections with search queries to try and identify any patterns or correlations with values such as watch time, follower gain, or viewership. Usage of this dataset presents no ethical considerations in the context of this project but data may be limited or skewed by other factors such as viewbotting.

[Google Trends](https://trends.google.com/trends/?geo=US) allows one to search for a search query such as Twitch or a channel name and returns information such as interest over time on a scale from 0 to 100, interest by subregion and location, related topics, and related search queries. Data is sortable by customizing time ranges which can match up with the year that the top streamers data was scraped and collected. When using Trends data, one must attribute the information to Google with a citation. For example, add "Data source: Google Trends (https://www.google.com/trends)." The data can be used to draw connections with Twitch stream data to understand if search queries have an effect on stream data. No ethical issues are foreseen in the context of this project.

## Unknowns and dependencies

With these datasets, I acknowledge the results of this project do not fully detail patterns as there are other variables not controlled such as type of stream, content, connectivity, or scheduling (for both viewers and streamers). In this regard, the results will likely be limited. While a minimum viable product detailing discoveries can be shared by the end of the quarter, it may not be fully flushed out due to the time constraint of four weeks. Additionally, these datasets are in CSV format only which means the data may not be as complex but is easy to read and process. Finally, the Twitch dataset heavily relies on Kaggle which may contain accurate or inaccurate information and as a result, reliability may be shaky and lead to hesitation about results. With this in mind, it is important to review the findings as potential patterns or correlation, but understand it may not be entirely accurate.

## Research questions

#### How might search results impact watch times and viewer count of a streamer?

This question aims to investigate if search results a streamer's channel. This question begins by analyzing search results and identifying if there is a pattern between the results and viewership.

#### What is the relationship between the popularity of a streamer and frequency of searches for their alias?

While similar to the previous question, instead this question begins analyzing streamer popularity and attempts to map their popularity to search queries.

#### How might the work of a streamer (time spent streaming) impact interest for that streamer?

This question aims to understand if a streamers effort is related to how frequently they are searched.

## Background and Related Work

Using the Twitch API, [Tomaz Bratanic has investigated streamers and their content](https://towardsdatascience.com/twitchverse-a-network-analysis-of-twitch-universe-using-neo4j-graph-data-science-d7218b4453ff). While not directly related to my analysis, there is helpful information around Twitch and streamers, who are a focal point of my analysis. This previous study helps set further foundation and context as I move forward.

Using the same streamer dataset I will be examining, [Jagannath Pal has investigated the relationship between streamer data such as between streaming time and followers gained](https://www.kaggle.com/code/jagannathpal/twitch-streamer-analysis-eda-prediction#Exploratory-Data-Analysis) which are values I too am exploring. [Lander Iturregi](https://www.kaggle.com/code/landeriturregi/twitch-streamers-2020-data-analysis) has also performed some analysis around Twitch to identify top streamers and general population data about streamers. These explorations will allow me to narrow my scope into select streamers, demographics, or types of content that I wish to pursue within my explorations.

While there is not public analysis directly related to my investigation of choice, these previous explorations empower me to pursue certain streamers, for example Lander's analysis shows that Tfue, shroud, and Myth were the top three streamers with the most followers in 2020 and may be worth analyzing. Tomaz's analysis similarly shows Tfue and shroud, but instead of Myth his data supports that rubius was the third most followed streamer in 2021. Tomaz's exploration also breaks down various Twitch categories and shows the five most popular being Just Chatting, Resident Evil Village, Grand Theft Auto V, League of Legends, and Fortnite at the time of research. Understanding various streamers, their specialties, and their chosen games to broadcast is important in understanding whether or not search queries and viewership has a significant correlation.

Previous research primarily informs my exploration in helping me select data to analyze as clearly there are too many streamers to sort through and examine the Google search trends for all of them. As a result, I will narrow my scope in some of the larger, more well known streamers that will likely have search queries for them as their Twitch channel, or alias, is fairly known and show the most relevant results (for instance, "Myth" may be a word, but there is enough popularity where streamer results also appear).

## Methodology

To begin investigating this data, I will narrow down to identify and select streamers and their aliases, or channel names, to work with. From this, I will then use the Google Trends dataset to view patterns of their alias being searched. I will also examine streamer data such as follower count and viewership for each alias.

I plan to begin with an ordinary least squares regression to understand if there is a relationship between search queries and streamer popularity, where popularity may be defined as their follower count or viewership. I intend to examine both. This method allows me to identify if there is or is not a potential relationship between my variables before I dive deeper into what that relationship might be.

As I analyze my data, I will refer back to my research questions to understand if there are any clear answers and if there is further investigation needed. Depending on those findings, I may choose to gather more samples (more streamers), narrow my scope, or broaden my scope.

Moving onwards, I will organize my findings in a table for visibility, but push towards data visualizations in the form of graphs or charts to showcase relationships or the lack of a relationship between my data. This will empower viewers to easily scan and comprehend my research findings. A simple table is helpful to view multiple datasets together, for instance multiple streamers with their search query trends, follower counts, and viewership. I could also showcase all of my streamer data in one graph, but this would be cluttered, noisy, and overall difficult to comprehend. As a result, I am to separate each streamer I analyze into a unique graph for data visualization that can showcase a potential relationship or lack of one.

# Processing Data

Begin by importing the csv module for ease working with spreadsheet data

In [2]:
import csv

Afterwards, prepare a specified dataset by loading a tab-separated file and converting it into dictionaries for ease of use.

In [3]:
def prepare_datasets(file_path):
    """ 
    Accepts: path to a tab-separated plaintext file
    Returns: a list containing a dictionary for every row in the file, 
        with the file column headers as keys
    """
    
    with open(file_path) as infile:
        # because we are reading a csv rather than a tsv, we will use a comma for our delimiter rather than tabs
        reader = csv.DictReader(infile, delimiter=',')
        list_of_dicts = [dict(r) for r in reader]
        
    return list_of_dicts

Now we may load our spreadsheet data and create a list of dictionaries before printing out the first few data values to check what data is provided.

In [4]:
# load data from the Top Streamers on Twitch csv into a list of dictionaries
top_twitchers = prepare_datasets("twitchdata-update.csv")
print(top_twitchers[0])
print(top_twitchers[1])
print(top_twitchers[2])
print(len(top_twitchers))

{'Channel': 'xQcOW', 'Watch time(Minutes)': '6196161750', 'Stream time(minutes)': '215250', 'Peak viewers': '222720', 'Average viewers': '27716', 'Followers': '3246298', 'Followers gained': '1734810', 'Views gained': '93036735', 'Partnered': 'True', 'Mature': 'False', 'Language': 'English'}
{'Channel': 'summit1g', 'Watch time(Minutes)': '6091677300', 'Stream time(minutes)': '211845', 'Peak viewers': '310998', 'Average viewers': '25610', 'Followers': '5310163', 'Followers gained': '1370184', 'Views gained': '89705964', 'Partnered': 'True', 'Mature': 'False', 'Language': 'English'}
{'Channel': 'Gaules', 'Watch time(Minutes)': '5644590915', 'Stream time(minutes)': '515280', 'Peak viewers': '387315', 'Average viewers': '10976', 'Followers': '1767635', 'Followers gained': '1023779', 'Views gained': '102611607', 'Partnered': 'True', 'Mature': 'True', 'Language': 'Portuguese'}
1000


Great! It seems that each dictionary contains a channel name, watch time, stream time, peak viewers, average viewers, followers, followers gained, views gained, partnered, mature, and language.

We also discovered this dataset counts the top 1000 streamers on Twitch as there are 1000 separate dictionaries.

This Twitch data is from the year 2020, so I will adjust my Google Trends query in the same way to examine the full year of 2020. By typing the alias of the top streamer, xQcOW, and viewing trends throughout the full year of 2020, I am able to see a week by week breakdown in csv form.

Here, it is important to note that Google Trends provvides an "interest score" rather than showcasing a value referencing the total number of search queries for the specified keyword. Google defines this interest score to "represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term."

Let's examine the data for the top streamer in a similar fashion as above to understand what data we are provided.

## Comparing Twitch Data

For starters, we will begin analyzing simply the Twitch Streamer data to see if there is a correlation between stream time and streamer popularity.

In [5]:
channel = top_twitchers[0]["Channel"]
streamTime = top_twitchers[0]["Stream time(minutes)"]
followGain = top_twitchers[0]["Followers gained"]
viewGain = top_twitchers[0]["Views gained"]
print(channel + " streamed for " + streamTime + " minutes in the year 2020 and gained " + followGain + " followers and " + viewGain + " views.")

xQcOW streamed for 215250 minutes in the year 2020 and gained 1734810 followers and 93036735 views.


The above was simple enough to understand three keys and their values. Stream time indicates how much time the streamer was actively streaming, followers gained represents the amount of individuals who followed the streamer within the year 2020, and views gained represent the amount of views the streamer gained in 2020.

By converting the stream time from minutes into other time lengths, we can better understand the frequency of streams.

In [6]:
hours = int(streamTime) / 60
days = hours / 24
weeks = days / 7

print(channel + "'s total stream time was " + str(hours) + " hours which is also " + str(days) + " days or " + str(weeks) + " weeks.")

xQcOW's total stream time was 3587.5 hours which is also 149.47916666666666 days or 21.354166666666664 weeks.


## Understanding Google Trends

In [22]:
# Streamer 1
# load data from the Google Trends search query into a list of dictionaries
xqcow = prepare_datasets("xqcow2020.csv")
print(xqcow[0])
print(xqcow[1])
print(xqcow[2])

{'Category: All categories': 'Week', None: ['xqcow: (United States)']}
{'Category: All categories': '2020-01-05', None: ['18']}
{'Category: All categories': '2020-01-12', None: ['35']}


Let's try to understand this data as it looks finicky and unorganized. The 'Category: All categories' is a key that includes what type of search we want, for example a car or an animal when searching for the term "jaguar." For this project, category will not be relevant as our streamer aliases will be under all categories. Next, 'Week' is actually a value corresponding to the 'Category: All categories' key. 'Week' shows the start day of each week, beginning on a Sunday and ending on the Saturday, spanning over 7 days. Finally, a strange term represented by 'None.' is a value refers to the interest score that Google Trends has provided.

To start, let us retrieve the week(s) corresponding to the highest interest score to determine which weeks Twitch data may be worthwhile to view. In order to do this, we must figure out how to access certain data so I will begin by understanding the dictionary keys and values.

In [8]:
print("Uncomment code to view responses from data.")
# for week in xqcow:
    # print(week) # retrieve each weeks data
    ## sample data: {'Category: All categories': '2020-01-05', None: ['24']}
    
    # print(week.get('Category: All categories')) # retrieve values of the key 'Category: All categories'
    ## sample data: 2020-01-05
    
    # print(week.get('None')) # attempt to retrieve values of "'None'" which does not exist
    ## sample data: None 
    
    # print(week.keys()) # retrieve all the keys in each week
    ## sample data: dict_keys(['Category: All categories', None])

    # print(week.values()) # retrieve all the values in each week
    ## sample data: dict_values(['2020-01-05', ['24']])

    # print(week.get(None)) # retrieve the values of 'None,' which are the interest scores
    ## sample data: ['24']

Uncomment code to view responses from data.


Great! We can access both the week and interest score by retrieving the values in each week. We can also access each value individually by getting the values of "'Category: all categories'" or "None." It is important to know that the key to retrieve a date includes quote marks whereas the key to retrieve interest scores does not.

It also seems we must move past the first list value as it has a format dissimilar than the others. We can do that by starting our index at 1 rather than the default 0.

## Determining highest interest score with the corresponding week for that score

In [9]:
# set counter variable
weekCount = 0;
# set variable to contain highest interest score and the week corresponding to that score
interestScore = 0;
interestScoreWeek = 0;

for week in xqcow[1:]: # start index at 1 rather than 0 to avoid the dissimilar data
    # retrieve the actual value, converted to an integer within each interest score because each interest score is a list containing one number
    weeklyScore = int(week.get(None)[0])
    if weeklyScore >= interestScore:
        # first check for duplicates and print them out when they happen for awareness
        if interestScore == weeklyScore:
            print("There was an interest score of " + str(interestScore) + " during weeks " + str(interestScoreWeek) + " and " + str(weekCount) + ".")
        interestScore = weeklyScore
        interestScoreWeek = weekCount
    weekCount += 1
    
print("There was an interest score of " + str(interestScore) + " during week " + str(interestScoreWeek) + ".")

There was an interest score of 100 during week 45.


Amazing, now we have found the highest interest score and the week corresponding to that score. We must then convert this week to the right date to look at. This can be simply done by taking the index of the week plus one, where the plus one accounts for the dissimilar data in row one, and retrieving the value of the week key (represented by 'Category: All categories').

In [10]:
print(xqcow[46:47][0].get('Category: All categories'))

2020-11-15


Now that we have determined a date to examine, we may keep this date in mind when assessing Twitch Streamer data. However, after further review I realize that this dataset is limited as it is a cumulative sum throughout the year. The above work is not completely irrelevant however, as I was able to learn more about my datasets, how to interact with them, and also determine limitations such as this one.

## Summing interest scores throughout the year

After I realize that the Twitch Streamers data was not divided by weeks, but rather a cumulative sum throughout the year, I acknowledge a need for change. I will now redirect my focus to summing the interest score of a streamer's alias throughout the entire year and comparing this to their stats to understand if there exists a relationship between interest score and streamer popularity. A streamer's cumulative interest score shows how often their alias was searched throughout the year and this will be useful in comparisons and finding patterns.

In [12]:
totalScore = 0
for week in xqcow[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("xQcOW had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

xQcOW had a interest score sum of 2170 throughout the year 2020.


In [157]:
print(top_twitchers[0].keys())
print(top_twitchers[0]["Channel"])

dict_keys(['Channel', 'Watch time(Minutes)', 'Stream time(minutes)', 'Peak viewers', 'Average viewers', 'Followers', 'Followers gained', 'Views gained', 'Partnered', 'Mature', 'Language'])
xQcOW


## Mimic above for other streamers

While the above could be automated for efficiency, I want to view the top 10 streamers, one at a time to see if there are discrepancies that persuade me to dive deeper.

In [16]:
# Streamer 2
summit1g = prepare_datasets("summit1g2020.csv")

totalScore = 0
for week in summit1g[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("summit1g had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

summit1g had a interest score sum of 1496 throughout the year 2020.


In [18]:
# Streamer 3
gaules = prepare_datasets("gaules2020.csv")

totalScore = 0
for week in gaules[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("Gaules had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

Gaules had a interest score sum of 1248 throughout the year 2020.


In [19]:
# Streamer 4
esl_csgo = prepare_datasets("esl_csgo2020.csv")

totalScore = 0
for week in esl_csgo[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("ESL_CSGO had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

ESL_CSGO had a interest score sum of 430 throughout the year 2020.


In [20]:
# Streamer 5
tfue = prepare_datasets("tfue2020.csv")

totalScore = 0
for week in tfue[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("tfue had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

tfue had a interest score sum of 1683 throughout the year 2020.


In [23]:
# Streamer 6
asmongold = prepare_datasets("asmongold2020.csv")

totalScore = 0
for week in asmongold[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("Asmongold had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

Asmongold had a interest score sum of 2700 throughout the year 2020.


In [26]:
# Streamer 7
nickmercs = prepare_datasets("nickmercs2020.csv")

totalScore = 0
for week in nickmercs[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("NICKMERCS had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

NICKMERCS had a interest score sum of 2467 throughout the year 2020.


In [27]:
# Streamer 8
fextralife = prepare_datasets("fextralife2020.csv")

totalScore = 0
for week in fextralife[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("Fextralife had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

Fextralife had a interest score sum of 3487 throughout the year 2020.


In [21]:
# Streamer 9
loltyler1 = prepare_datasets("loltyler12020.csv")

totalScore = 0
for week in loltyler1[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("loltyler1 had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

loltyler1 had a interest score sum of 2014 throughout the year 2020.


In [28]:
# Streamer 10
anomaly = prepare_datasets("anomaly2020.csv")

totalScore = 0
for week in anomaly[1:]:
    weeklyScore = int(week.get(None)[0])
    totalScore += weeklyScore
    
print("Anomaly had a interest score sum of " + str(totalScore) + " throughout the year 2020.")

Anomaly had a interest score sum of 3807 throughout the year 2020.


Now before visualizing this data, I can see that the cumulative interest scores are varying even as we move down top streamers.

## Does interest score influence how increasingly popular a streamer grows throughout the year?

#### How might search results impact watch times and viewer count of a streamer?
Let's visualize how interest scores influenced the growth of a streamer using their gainage of followers and views.

![Followers gained vs. Interest Score.png](attachment:cec62315-d971-4774-8ad6-cd95773ea002.png)

![Views gained vs. Interest Score.png](attachment:66e65aac-ecff-46a3-be83-3a55247bfcb1.png)

As we examine cumulative interest scores increasing, we can observe that there exists no real pattern of views or followers gained. As a result, using our sample data, we can attempt to conclude that the amount of search queries for a Twitch alias has no influence on their popularity gainage. While followers gained varied across all ten sample streamers, a majority of the follower gains were in the middle interest scores and exhibited no real pattern. With the exception of the second most popular streamer, popular being defined as having the most interest score and being searched most frequently, all of the streamers had a similar viewership gain.

Examining the R2 coefficient, we can further conclude that there exists little to no correlation between interest score and popularity growth.

## Does a streamer's existing popularity influence their interest score?

#### What is the relationship between the popularity of a streamer and frequency of searches for their alias?
While interest score did not increase the growth rate of streamers in terms of their viewership and following, the frequency of searches for their alias may be influenced by their existing popularity. To better understand this, we will plot their pre-existing data including followers and viewers and compare it to their interest score.

![Interest Score vs. Followers.png](attachment:2cb5e1ff-acf8-4c57-8a52-2afd21876694.png)

As seen in the above graph, the streamers with more existing followers did not necessarily have more interest score throughout the year. In fact, it seems that the less followers one had beforehand actually led to a higher interest score. The R2 value around 0.185 suggests there may be a weak negative correlation between the two variables and that as previous follower count increased, interest score decreased.

One potential cause for this pattern may be that the streamers with less followers were discussed more often and led to more frequent searches. These streamers may have been growing or had certain moments that piqued viewer curiosity leading to searches around their alias. Obviously, we cannot conclude this cause and effect but this may explain some results. In the same way, it is likely popular streamers with high follower bases did not need to be looked up as often.

![Interest Score vs. Peak viewers.png](attachment:5b17fd23-19be-41fe-88b5-3b31f6df3954.png)

The above graph displays a moderately negative correlation between peak viewers and interest score as represented by the R2 value above 0.5. As a streamer's peak viewership increased, their interest score seemingly decreased. Peak viewers is the maximum number of viewers one streamer had at one moment during the year.

While this data does not conclusively establish any patterns or findings, it is reasonable to deduce that the higher viewership streamers as determined by their peak viewers, were well known enough to where their alias was not as frequently searched. In addition, they may draw in similar numbers as their peak. Conversely, those with lower peak viewers may be less known and potentially had a certain peak that intrigued users to search about their background as they do not necessarily draw these same numbers often.

![Interest Score vs. Watch time(Minutes).png](attachment:c38eab8a-276f-47e7-b89f-e891bda5c76d.png)

Lastly, we examine the total watch time of a user's stream. This is the total amount of time all viewers spent watching a stream. In a similar fashion to the above data, we can see there's a negative correlation between watch time and interest score. The correlation is weak, but exists to a small extent as seen by the R2 value above 0.2.

In a similar fashion to above, we can attempt to rationalize these findings. As a streamer had more watch time, their interest score decreased because they were well known enough and did not have as many search queries. In contrast, streamers with less watch time were searched more frequently. We may reason that this was to learn more about the streamer or access their stream, but cannot definitively conclude this.

## Does a streamer's effort, represented by time streaming, affect their interest score?

#### How might the work of a streamer (time spent streaming) impact interest for that streamer?
Below we will examine a streamer's stream time, defined by the amount of time they spent streaming on Twitch for the year, and attempt to understand if their work hours influenced their interest score and how frequently they were searched.

![Interest Score vs. Stream time(minutes).png](attachment:35a67533-6aef-42a8-bb9d-82f17df8d55f.png)

Above is a chart that showcases a R2 coefficient greater than 0.5 to indicate that there is a moderately negative correlation between stream time and interest score. Similar to a streamer's existing popularity, we may attempt to conclude that the more time a streamer spent working (streaming), the less they were searched up because they were either better known or were more accessible on Twitch. Being more accessible on Twitch may mean their channels were able to be easily discovered and did not require a search to locate. This rationale makes sense because a streamer who streams more often will likely be seen more often and as a result, be more well known. Similarly, they may be streaming more often because they have existing interest in the form of followers and viewers. Again, however, we cannot accurately deduce all of this and we must consider this conclusion limited.

## Conclusion

After examining the data, there are weak conclusions to draw. Not all of the data informed me about the relationship between a streamer's interest score and popularity, but there were elements that worked to support one another.

Streamer growth did not get much influence, if any at all, from interest scores. Those who searched up streamer aliases did not contribute much to the streamer's viewership or follower base and this is reflected in the section **"Does interest score influence how increasingly popular a streamer grows throughout the year?"**

However, a streamer's existing popularity did impact their interest score. As seen in the section **"Does a streamer's existing popularity influence their interest score?"** the follower count, peak viewership, and watch time of a streamer led to a weak correlation with interest scores throughout the year. In other words, the more followers, higher viewership, and higher watch time of a streamer led to lower interest scores.

Lastly, the amount of effort a streamer exhibits, as determined by the amount of time they spend streaming, also influences their interest score. In section **"Does a streamer's effort, represented by time streaming, affect their interest score?"** we see that the more time streamed, the lower the interest score.

My initial hypothesis was that the top streamers would have the most interest score. I expected a positive correlation as these streamers would be well known and searched up often. Instead, I found the opposite. The more popular streamers did not have as many searches and while there are many reasons, the main argument I can naturally rationalize is that these streamers are so popular that they are actually searched less often because of how well known they already are.