Skip to content

Code for summarizing YouTube videos using TF-IDF Vectorization and BART Model.

Notifications You must be signed in to change notification settings

umangkalavadiya/YTSummarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

YTSummarizer

This code is used to summarize YouTube video subtitles using two different methods: TF-IDF Vectorization and BART Model.

Libraries Used:

youtube_transcript_api nltk re sklearn transformers

Summarization Using TF-IDF Vectorization: In this method, the subtitles of a YouTube video are first retrieved using the YouTubeTranscriptApi library. The subtitles are then preprocessed by removing line breaks and tokenized into sentences using the sent_tokenize method from the nltk library.

After the sentences are tokenized, the TfidfVectorizer method from the sklearn library is used to convert the text into vectors. The sentences are then scored using the tf-idf algorithm and the top N sentences with the highest scores are selected.

Finally, the selected sentences are ordered based on their position in the original subtitle file and joined together to create a summary of the video.

Summarization Using BART Model: In this method, the BartTokenizer and BartForConditionalGeneration classes from the transformers library are used to perform summarization.

The subtitles of the YouTube video are first retrieved and encoded using the BartTokenizer. The encoded text is then passed to the BartForConditionalGeneration model to generate a summary of the video.

The output of the BartForConditionalGeneration model is a tensor, which is then decoded into text using the BartTokenizer.

Conclusion This code can be used to generate summaries of YouTube videos using two different methods. The TF-IDF vectorization method is a simple and effective way to generate summaries, while the BART model provides more advanced summarization capabilities.

About

Code for summarizing YouTube videos using TF-IDF Vectorization and BART Model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published