Collaborators: Jai Agrawal, Daniil Abbruzzese, Todd Gavin, Tania Dawood
Effects of the Pandemic on Media Usage and Consumption
We have compiled a large dataset containing data provided from Pixstory, as well as additional data sources of various MIME types to perform an analysis on how the COVID pandemic has affected people's usage of social media.
The additional datasets all have detailed notebooks on their use, running each notebook file should give the outputs required. The dataset directories follow Dataset [num]_[title] formats. Further, the additional columns that needed to be added are also in detailed notebooks within directories. These directories follow [letter]_[col name] formats. The ‘1_Apache_Tika_Analysis’ directory contains detailed instructions how to run the Tika similarity like we did. The “Report_Questions” directory has separate .py files which were used to make visualizations used in the report to illustrate certain findings.
Sports Datasets:
Film Festivals:
- https://www.film-fest-report.com/home/film-festivals-2022
- https://www.screendaily.com/news/2021-film-festivals-and-markets-latest-dates-postponements-and-cancellations/5155284.article
- https://www.filmfestivaldatabase.com
Hate Speech Dataset:
Sarcasm Dataset:
MIME Type: test/CSV
Usage: To compare the number of snapchats DAU's to the daily posts on Pixstory to explore our research question. This dataset includes date, Snapchat's Daily Average Users and Revenue, Snapchat Stock data (Open, High, Low, Close, Adjacent Close and Volume)
In order to create this dataset, we obtained Snachat’s quarterly revenue and daily average users from Statista and its daily stock value from Yahoo Finance. Our data ranges from 2020-2022 as that is the range of data we have available for the Pixstory dataset. For the Snapchat dataset, the features we will be using for analysis are:
Feature 1: Daily Average Users
Feature 2: Stock Price
Feature 3: Revenue
Source:
- https://www.statista.com/statistics/552694/snapchat-quarterly-revenue/
- https://finance.yahoo.com/quote/SNAP/
- https://www.statista.com/statistics/545967/snapchat-app-dau/
MIME Type: application/JSON
Usage: To keep track of the number of daily covid cases, deaths and vaccinations to see how these correlate to the number of Pixstory posts, Snapchat DAU's and number of likes/ views on the daily trending YouTube videos. This dataset includes the date, number of deaths, number of cases and number of vaccinations.
Feature 1: New daily deaths due to COVID in India
Feature 2: New daily COVID cases in India
Feature 3: New vaccinations against COVID in India
Note: this data set only had data available as early as 1/15/2021, which listed India as having 0 vaccinations against COVID. Because the Pixstory data set starts 1/12/2020, we decided to and this missing dates and input 0's for all vaccinations. This was justified because according to this data set India hadn't had any vaccinations until 1/16/2021.
Source:
MIME Type: Video/ MP4
Usage: Similar to the Snapchat DAU dataset, we wanted to see if there was any correlation between the number of likes and views to Pixstory posts and COVID cases. This dataset includes data on video ID, title published at, channel ID, channel title, category ID trending date, tags, view count, likes, dislikes and comment count
This is a dataset of the top trending videos on YouTube on any particular day. The MIME Type of this dataset is Video/ MP4. The data ranges from 2020 - 2022 and the features of this include:
Feature #1: highest trending video name,
Feature #2: highest trending video channel,
Feature #3: highest trending video category,
Feature #4: highest trending video views,
Feature #5: highest trending video likes
Source:
Note: all datasets were combined using the date column.