# Cleaning Frasier Transcript Data
Goals:
- Import data
- Select only relevant features (character, lines)
- Clean transcripts (set all characters to lowercase, and remove special characters)
- Export to csv

## Imports

In [69]:
import pandas as pd
import duckdb
import string
import warnings
warnings.filterwarnings("ignore")

## Data Cleaning

In [70]:
# Import data
data = pd.read_csv("transcripts.csv")

# Remove rows where lines is empty
data = data[[not x for x in data["lines"].isna()]]
data.head(3)

Unnamed: 0,character,lines,season,episode,gender,title,directedBy,writtenBy,originalAirDate,viewershipInMillions,imdbVotes,imdbRatings,characterName,actorName,characterType,episodeCount
0,Frasier,"Listen to yourself, Bob You follow her to work...",1,1,male,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Frasier Crane,Kelsey Grammer,main,1.0
1,Roz,"Yes, Dr Crane. On line four, we have Russell f...",1,1,female,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Roz Doyle,Peri Gilpin,main,1.0
2,Frasier,"Hello, Russell. This is Dr Frasier Crane; I'm ...",1,1,male,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Frasier Crane,Kelsey Grammer,main,1.0


In [83]:
# Create a table mapping all punctuation characters to empty strings
punctuationTable = str.maketrans(dict.fromkeys(string.punctuation, ""))
# Convert all lines to lowercase and then remove punctuation using the punctuation table
cleanedLines = [line.lower().translate(punctuationTable) for line in data["lines"]]
# Replace the original lines with the cleaned lines
data["lines"] = cleanedLines

data.head(3)

Unnamed: 0,character,lines,season,episode,gender,title,directedBy,writtenBy,originalAirDate,viewershipInMillions,imdbVotes,imdbRatings,characterName,actorName,characterType,episodeCount
0,Frasier,listen to yourself bob you follow her to work ...,1,1,male,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Frasier Crane,Kelsey Grammer,main,1.0
1,Roz,yes dr crane on line four we have russell from...,1,1,female,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Roz Doyle,Peri Gilpin,main,1.0
2,Frasier,hello russell this is dr frasier crane im list...,1,1,male,The Good Son,James Burrows,David Angell & Peter Casey & David Lee,1993-09-16,28.0,528,8.8,Frasier Crane,Kelsey Grammer,main,1.0
