Skip to content

Latest commit

 

History

History
255 lines (178 loc) · 12.5 KB

README.md

File metadata and controls

255 lines (178 loc) · 12.5 KB

You Described, We Archived: A Rich Audio Description Dataset

The You Described, We Archived dataset (YuWA) is a collaboration between San Francisco State University and The Smith-Kettlewell Eye Research Institute. It includes audio description (AD) data collected worldwide 2013-2022 through YouDescribe, an accessibility tool for adding audio descriptions to YouTube videos. YouDescribe, a web-based audio description tool along with an iOS viewing app, has a community of 12,000+ average annual visitors, with 3,000+ volunteer describers, and has created over 5,500+ audio described YouTube videos.

Blind and visually impaired (BVI) viewers request YouTube videos that are saved to a wishlist and volunteer audio describers select a video, write a script, record audio clips, and edit clip placement to create an audio description. The audio description tracks are stored separately and played together with the YouTube video then posted for public view at YouDescribe

The YuWA dataset covers a vast domain of videos in 15  titled categories including Film & Animation, Music, Autos & Vehicles, Travel & Events, Pets & Animals, Sports, People & Blogs, Gaming, Comedy, Entertainment, How-To & Style, News & Politics, Nonprofits & Activism, Education, Science & Technology. A video can have multiple audio descriptions and an audio description can have multiple audio clips recorded by volunteer describers. The audio clips recorded before May, 2020 were transcribed using Listen By Code and the audio clips recorded after that are transcribed using Google Cloud Speech to Text API. Viewers can rate the audio descriptions on a scale ranging from 1-5 (1 being poor, 5 being excellent). Viewers can also provide feedback to the describers by selecting some improvements from the list.

The YuWA data repository includes all YouDescribe related audio descriptions from 2013-2022 and can be sorted to include or exclude important YouDescribe milestones. We have focused on data collected by YouDescribe since March 17, 2017 and Google Analytics data which started tracking traffic since July 30, 2020. This scalable dataset will be regularly updated as new videos, audio descriptions and audio clips gets uploaded.

Run Instructions

The download_yd_data.py file was tested using Python 3.9. So, please make sure that when you use python, your Python version is at least Python 3 or make sure you specify python3.

  1. Install the requests module:
pip install requests

# If using python3
pip3 install requests
  1. Run the python file:
# The default configuration will store the audio clips in the current directory
# separated by YouTube video ID and Audio Description ID.
# --audioDescDir: This option allows you to specify the output directory where
#                 the audio clips will be stored.

python download_yd_data.py

# If specifying python3
python3 download_yd_data.py

# Specify output directory
python download_yd_data.py --audioDescDir=<PATH_TO_OUTPUT_DIR>
  1. Follow the onscreen instructions when running the python file to register with the YuWA system and receive an API key to access the audio clips.

  2. After you receive an API key which should be written in a file called yuwa.json, run the python file again and follow the on-screen instructions.

Data Analysis

Audio Descriptions Yearly

Audio Descriptions Yearly

March 17, 2017 - December 31, 2021

Audio Descriptions Grouped by Year and Month

Audio Descriptions Grouped by Year and Month

March 17, 2017 - September 21, 2022

Premium dataset - Number of New and Existing Descriptions by year

Premium dataset - Number of New and Existing Descriptions by year

September 1, 2017 - August 31, 2022

Premium dataset - Number of New and Existing Describers by year

Premium dataset - Number of New and Existing Describers by year

September 1, 2017 - August 31, 2022

Percentage of Rated / Unrated Audio Descriptions

Percentage of Rated / Unrated Audio Descriptions

March 17, 2017 - September 21, 2022

Audio Descriptions Ratings

Audio Descriptions Ratings (excellent to poor)

March 17, 2017 - September 21, 2022

Statistics for Videos, Audio Descriptions and Audio Clips

Statistics for Videos, Audio Descriptions and Audio Clips

March 17, 2017 - September 21, 2022

Audio Clips by Playback Type

Audio Clips by Playback Type

March 17, 2017 - September 21, 2022

Described Videos in each YouTube video category

Described Videos in each YouTube video category

March 17, 2017 - September 21, 2022

Audio Descriptions with Extended Audio Clips grouped by Video Category

Audio Descriptions with Extended Audio Clips grouped by Video Category

March 17, 2017 - September 21, 2022

Audio Descriptions with Inline Audio Clips grouped by Video Category

Audio Descriptions with Inline Audio Clips grouped by Video Category

March 17, 2017 - September 21, 2022

Top 10 most requested videos

Top 10 most requested videos

March 17, 2017 - September 21, 2022

Wishlist videos for each YouTube video category

Wishlist videos for each YouTube video category

March 17, 2017 - September 21, 2022

Wishlist videos requested and described per year

Wishlist videos queued and described per year

March 17, 2017 - December 31, 2021

Contributions

The Smith-Kettlewell Eye Research Institute

San Francisco State University

Ability Central

National Institute on Disability, Independent Living, and Rehabilitation Research

Meet the Team Members

Team Member Name Year Role
Dr. Joshua Miele 2013 - Present YouDescribe Creator
Charity Pitcher-Cooper 2017 - Present Product Manager
Rodrigo Leme de Mello 2017 - 2020 Principal Software Engineer
Dr. Ilmi Yoon 2018 - Present Principal Investigator
Rupal Khilari 2016 - 2017 Software Developer
Andrew Taylor Scott 2019 - Present Machine Learning Lead Engineer
Dr. Yue-Ting Siu 2013 - Present Describer Trainer, Interface Design Researcher
Dr. Shasta Ihorn 2020 - Present User Study Researcher
Dr. Abhishek Das 2018 - 2020 Machine Learning Engineer
Yash Kant 2018 - 2020 Machine Learning Engineer
Umang Mathur 2018 - 2020 Software Developer
Dr. Beste Yuksel 2018 - 2020 HCI Researcher
Jianfei Zhao 2018 - 2020 Software Developer
Poorva Rathi 2018 - 2020 Software Developer
Vaishali Bisht 2018 - 2020 Software Developer
Raya Farshad 2018 - 2020
Jose Castanon 2018 - Present Software Developer
Aditya Bodi 2018 - 2020 Software Developer
Brenna Tirumalashetty 2018 - 2020
Manish Patil 2018 - 2020 Software Developer
Varun Sura 2020 - Present iOS App Developer
Lothar Narins 2019 - Present Machine Learning Engineer
Bhavani Gorganthu 2021 - 2022 YouDescribeX Web Interface
Benjamin Kao 2022 - Present Team Leader/Software Engineer
Hirva Patel 2022 - Present Software Developer
Kishan Patel 2022 - Present Mobile Developer
Manali Seth 2022 - Present Data Engineer
Sanket Naik 2022 - Present Software Developer
Vishal Sharma 2022 - Present Software Developer
Ishank Aggarwal 2022 - Present Machine Learning Engineer
Caelen Wang 2022 - Present Machine Learning Engineer
Kimon Monokandilos 2022 - Present Software Engineer

Dr. Joshua Miele 2013 - 2020 YouDescribe Creator Charity Pitcher-Cooper 2017 - Present Product Manager Rodrigo Leme de Mello 2017 - 2020 Principal Software Engineer Dr. Ilmi Yoon 2015 - Present Principal Investigator Rupal Khilari 2016 - 2017 Software Developer Andrew Taylor Scott 2019 - Present Machine Learning Lead Engineer Dr. Abhishek Das 2018 - 2020 Machine Learning Engineer Yash Kant 2018 - 2020 Machine Learning Engineer Umang Mathur 2018 - 2020 Software Developer Dr. Beste Yuksel 2019 - 2020 HCI Researcher Jianfei Zhao 2018 - 2020 Software Developer Poorva Rathi 2018 - 2020 Software Developer Vaishali Bisht 2018 - 2020 Software Developer Raya Farshad 2018 -2020 Jose Castanon 2018 - 2022 Software Developer Aditya Bodi 2018 - 2020 Software Developer Brenna Tirumalashetty 2018 - 2020 Manish Patil 2018 - 2020 Software Developer Varun Sura 2020 - 2021 iOS App Developer Lothar Narins 2019 - Present Machine Learning Engineer Bhavani Gorganthu 2021 - 2022 YouDescribeX Web Interface Benjamin Kao 2022 - Present Team Leader/Software Engineer Hirva Patel 2022 - Present Software Developer Kishan Patel 2022 - Present Mobile Developer Manali Seth 2022 - Present Software/Data Engineer Sanket Naik 2022 - Present Software Developer Vishal Sharma 2022 - Present Software Developer Ishank Aggarwal 2022 - Present Machine Learning Engineer Caelen Wang 2022 - Present Machine Learning Engineer Kimon Monokandilos 2022 - Present Software Engineer

Licensing

You Described, We Archived ©️ 2022 by Smith-Kettlewell Eye Research Institute, San Francisco State University is licensed under CC BY-NC-ND 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/