Lines from South Park

This analysis was done in Python 3.5. It is a data analysis project for fun. The goal is to see what I can find about the show just from the lines. The dataset was from Kaggle. It contains lines of the first 18 seasons of the show. These lines have been annotated with season, episode, and speaker.

A quick line/word count shows that there are over 70,000 lines and over 800,000 words in the dataset. Not suprisingly, the lines are relatively short (with a median of 8 words and a 75 percentile of 14 words). The numbers of lines/words are compared by season to show how they have evolved. The lines are also used to understand the characters of the show, including finding the top speakers, their most frequent words, as well as words with highest term frequency-inverse document frequency (tf-idf).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebook		notebook
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
plot_style.json		plot_style.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lines from South Park

About

Releases

Packages

Languages

yanfei-wu/tv_lines

Folders and files

Latest commit

History

Repository files navigation

Lines from South Park

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages