## Star Trek Dialogue Analysis Proposal

### Data Sources: 
#### https://www.kaggle.com/datasets/birkoruzicka/startrekdialoguetranscripts/data
#### https://github.com/BirkoRuzicka/Star-Trek-Dialogue-Analysis/tree/main/additional_data

### Context & Goals: 
#### Analyzing the dialogue of all Star Trek series as a data analysis project provides a unique opportunity to explore the evolution of societal and technological themes over decades. By examining the language used by each character, how often language is used and by whom, etc., one can uncover insights into cultural shifts, predictive elements of future technologies, and the series' impact on shaping perspectives on diversity, ethics, and the human condition. 

### Key Data Concepts: 
#### Series and Episodes: The data is organized by different Star Trek series (e.g., TOS, TAS, TNG) and further divided into episodes. Each episode contains information about characters and their dialogues.

#### Characters: Dialogues are associated with specific characters from the Star Trek universe. The main cast for each series is defined, and dialogues spoken by characters outside the main cast are filtered out.

#### Dialogues: Dialogues are the spoken lines of characters in the Star Trek series. The length of each dialogue is calculated and stored.

#### Additional Data:  Additional data sources (e.g., 'tos_data.csv', 'tos_gender.csv') are used to enrich the dataset. This additional data includes information like season, year, title, and gender, and is merged into the main dataset.

In [126]:
import json
import pandas as pd
import os

In [127]:
path = r'C:\Users\anon\Documents\CareerFoundry\Data Analytics Immersion\6\Data\Raw Data'

In [128]:
# Reading in the whole data set as dictionary

with open(os.path.join(path, 'StarTrekDialogue.json'), 'r') as read_file:
    all_series = json.load(read_file)

In [129]:
# Define series variables and additional data sources for gender, etc. to be mapped, merged, and used to create series specific dataframes

series_dict = {
    'tos': {'data': all_series['TOS'], 'data_file': os.path.join(path, 'tos_data.csv'), 'gender_file': os.path.join(path, 'tos_gender.csv')},
    'tas': {'data': all_series['TAS'], 'data_file': os.path.join(path, 'tas_data.csv'), 'gender_file': os.path.join(path, 'tas_gender.csv')},
    'tng': {'data': all_series['TNG'], 'data_file': os.path.join(path, 'tng_data.csv'), 'gender_file': os.path.join(path, 'tng_gender.csv')},
    'ds9': {'data': all_series['DS9'], 'data_file': os.path.join(path, 'ds9_data.csv'), 'gender_file': os.path.join(path, 'ds9_gender.csv')},
    'voy': {'data': all_series['VOY'], 'data_file': os.path.join(path, 'voy_data.csv'), 'gender_file': os.path.join(path, 'voy_gender.csv')},
    'ent': {'data': all_series['ENT'], 'data_file': os.path.join(path, 'ent_data.csv'), 'gender_file': os.path.join(path, 'ent_gender.csv')},
    'dis': {'data': all_series['DIS'], 'data_file': os.path.join(path, 'dis_data.csv'), 'gender_file': os.path.join(path, 'dis_gender.csv')},
    'pic': {'data': all_series['PIC'], 'data_file': os.path.join(path, 'pic_data.csv'), 'gender_file': os.path.join(path, 'pic_gender.csv')},
}

In [130]:
# Define main cast variables

main_cast = {
    'tos': ['KIRK', 'SPOCK', 'UHURA', 'CHEKOV', 'SULU', 'CHAPEL', 'COMPUTER', 'MCKOY', 'SCOTT'],
    'tas': ['KIRK', 'SPOCK', 'UHURA', 'CHEKOV', 'SULU', 'CHAPEL', 'COMPUTER', 'MCKOY', 'SCOTT'],
    'tng': ['PICARD', 'RIKER', 'WORF', 'DATA', 'TROI', 'CRUSHER', 'TASHA', 'CHIEF', "O'BRIEN", 'GUINAN', 'LAFORGE', 'PULASKI', 'WESLEY'],
    'ds9': ['SISKO', 'ODO', 'KIRA', 'JAKE', 'QUARK', 'DAX', "O'BRIEN", 'BASHIR', 'WORF', 'EZRI'],
    'voy': ['JANEWAY', 'CHAKOTAY', 'TUVOK', 'PARIS', 'TORRES', 'KIM', 'EMH', 'NEELIX', 'KES', 'SEVEN', 'ICHEB', 'SESKA'],
    'ent': ['ARCHER', 'DEGRA', 'HOSHI', 'PHLOX', 'REED', 'SHRAN', "T'POL", 'TRAVIS', 'TUCKER'],
    'dis': ['BURNHAM', 'SARU', 'VOQ', 'TYLER', 'STAMETS', 'TILLY', 'LORCA', 'CULBER', 'PIKE', 'BOOK', 'NHAN', 'ADIRA', 'GRAY', 'GEORGIOU', 'DETMER', 'OWOSEKUN', "L'RELL", 'SAREK', 'CORNWELL', 'AIRIAM', 'SPOCK'],
    'pic': ['PICARD', 'AGNES', 'DAHJ', 'DATA', 'ELNOR', 'HUGH', 'SOJI', 'RAFFI', 'RIOS', 'NAREK', 'SEVEN', 'RIZZO']
}

In [131]:
# Helper function to filter out dialogue spoken by characters other than the main cast

def remove_secondary_cast_dialogue(df, main_cast_list):
    return df[df['Character'].isin(main_cast_list)]

In [132]:
# Helper function to merge in data (gender, etc.) from additional csv files

def merge_additional_data(df, data_file, gender_file):
    additional_data = pd.read_csv(os.path.join(path, data_file), index_col=0, delimiter=';', encoding='latin1')
    df = df.merge(additional_data, left_on='Episode', right_index=True)

    gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()
    df['Gender'] = df['Character'].map(gender_mapping)

    return df

In [144]:
# Create a dataframe for each series

series_dataframes = {}

def create_series_df(series_name, series_data, main_cast_list, data_file, gender_file):
    if series_data not in [tos, tas, tng, ds9, voy, ent, dis, pic, st]:
        print('Series not recognized')
        return

    # Transform into dataframe:
    series_df = pd.concat({k: pd.Series(v) for k, v in series_data.items()}).reset_index()
    series_df.columns = ['Episode', 'Character', 'Dialogue']

    # Get length of dialogue **calculates length of the list - this needs editing so that it retains the dialogue as an acutal list
    series_df['Dialogue Length'] = series_df['Dialogue'].str.len()

    # Drop dialogue not spoken by main cast **MAY OR MAY NOT USE THIS
#     series_df = remove_secondary_cast_dialogue(series_df, main_cast_list)

    # Merge columns regarding gender, etc. from additional data sources
    series_df = merge_additional_data(series_df, data_file, gender_file)

    series_df = series_df[['Episode', 'Season', 'Year', 'Title', 'Character', 'Gender', 'Dialogue', 'Dialogue Length']]

    series_dataframes[series_name] = series_df  # Store the DataFrame in the dictionary
    
#     series_df.to_csv(f'{series_name}_df_cleaned.csv')

    return series_df

In [145]:
for series_name, series_info in series_dict.items():
    create_series_df(series_name, series_info['data'], main_cast[series_name],
                     series_info['data_file'], series_info['gender_file'])



  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, index_col=0, squeeze=True, delimiter=';').to_dict()


  gender_mapping = pd.read_csv(os.path.join(path, gender_file), header=None, inde

In [152]:
tos_dataframe = series_dataframes['tos']
tas_dataframe = series_dataframes['tas']
tng_dataframe = series_dataframes['tng']
ds9_dataframe = series_dataframes['ds9']
ent_dataframe = series_dataframes['ent']
dis_dataframe = series_dataframes['dis']
pic_dataframe = series_dataframes['pic']
voy_dataframe = series_dataframes['voy']

In [186]:
pd.set_option('max_colwidth', None)
tos_dataframe.head(10)

Unnamed: 0,Episode,Season,Year,Title,Character,Gender,Dialogue,Dialogue Length
0,tos_000,tos_s1,1966,The Cage,SPOCK,m,"[Check the circuit., It can't be the screen then. Definitely something out there,\r Captain, headed this way., Their call letters check with a survey expedition. SS Columbia.\r It disappeared in that region approximately eighteen years ago., Records show the Talos group has never been explored. Solar\r system similar to Earth, eleven planets. Number four seems to be Class\r M, oxygen atmosphere., We aren't going to go, to be certain?, Mister Spock here. We're intercepting a follow-up\r message, sir. There are crash survivors on Talos., Preliminary lab survey ready, sir., Yes, sir., Spock here., There is no survivors' encampment, Number One. This is all some\r sort of trap. We've lost the Captain. Do you read?, The inhabitants of this planet must live\r deep underground, and probably manufacture food and other needs down\r there. Our tests indicate the planet surface, without considerably more\r vegetation or some animals, simply too barren to support life., Exactly. An illusion placed in our minds by this planet's\r inhabitants., They may simply be studying the Captain, to find out how Earth\r people are put together. Or it could be something more., Look. Brains three times the size of ours. If we start buzzing\r about down there, we're liable to find their mental power is so great\r they could reach out and swat this ship as though it were a fly., Standing by, Number One., Ten, nine, eight, seven, six, five, four, three, two, one., Our circuits are beginning to heat. We'll have to cease\r power., We've located a magnetic field that seems to\r come from their underground generator., If our measurements and readings are an illusion also, one could\r find oneself materialised inside solid rock., The women!, Address intercraft., This is the acting captain speaking. We have no choice now but\r to consider the safety of this vessel and the remainder of the crew.\r We're leaving. All decks prepare for hyperdrive. Time warp factor., Engine room!, Mister Spock here. Switch to rockets. We're blasting out., Nothing. But for the batteries we'd lose gravitation and oxygen., Could be we've waited too long. It's collecting all the\r information stored in this fly. They've decided to swat us., Mister Spock here.]",27
1,tos_000,tos_s1,1966,The Cage,TYLER,m,"[All operating, sir., It could be these meteorites., It's coming at the speed of light, collision course. The\r meteorite beam has not deflected it, Captain., I have a fix. It comes from the Talos star group., It would take that long for a radio beam to travel from there to\r here., System open., Course computed and on the screen., On course, sir., We've settled into orbit, sir., Captain? Reflections, sir, from the planet's surface. As I read\r it, they polarise out as rounded metal bits. Could be parts of a\r spaceship hull., Yes, sir., And you won't believe how fast you can get back. Well the time\r barrier's been broken. Our new ships can, Captain!, Then why aren't we doing anything? That entry may have stood up\r against hand lasers, but we can transmit the ship's power against it.\r Enough to blast half a continent., It's Captain Pike they've got. He needs help, and he probably\r needs it fast., Mister Spock, the ship's controls have gone dead., There's nothing. Every system aboard is fading out., The computers!, I can't shut it off. It's running through our library. Tapes,\r micro-records, everything. It doesn't make sense., All power has come on, Mister Spock. The helm is answering\r to control., Eve, sir? Yes, sir.]",21
2,tos_000,tos_s1,1966,The Cage,ONE,,"[No, it's something else. There's still something out there., Evasive manoeuvres, sir?, We've no ships or Earth colonies that far out., Then they could still be alive, even after eighteen years., Yes, sir., All decks have acknowledged, sir., She's replacing your former yeoman, sir., Of course, sir., Landing party, come in., Quarters are being prepared, sir. Have I permission to send\r out scouting and scientific parties now?, I didn't get that last message, Captain., Landing party, come in., So we just thought we saw survivors there, Mister Spock., Any estimate what they might want one of us for?, Engineering deck will rig to transmit ship's power.\r We'll try blasting through that metal., All circuits engaged, Mister Spock., Take cover., Increase to full power! Can you give us any more?, Disengage. The top of that knoll should have been sheared off the\r first second., Now, you all know the situation. We're hoping to transport down\r inside the Talosian community., Nothing will be said if any volunteer wants to back out., Captain! Captain., But we were a party of six., They were fully charged when we left. It's dead. I\r can't make a signal. What is it?, Offspring as in he's Adam. Is that it?, Well, shall we do a little time computation? There was a Vina\r listed on that expedition as an adult crewman. Now, adding eighteen\r years to your age then., Captain., They kept us from seeing this, too. We cut through and never knew\r it. Captain., It's wrong to create a whole race of humans to live as slaves., Captain, we have transporter control now., Isn't she coming with us?, Yeoman! You've delivered your report., All decks show ready, sir.]",33
3,tos_000,tos_s1,1966,The Cage,PIKE,m,"[Steady as we go., They were keyed to cause interference and attract attention this\r way., If they survived the crash., Not without any indication of survivors, no. Continue to the Vega\r Colony and take care of our own sick and injured first. You have the\r helm. Maintain present course., Drop by my cabin, Doctor. What's that? I\r didn't say there's anything wrong with me., That's right. Unless we get anything more positive on it, it\r seems to me the condition of our own crew takes precedent. I'd like to\r log the ship's doctor's opinion, too., Good. I'm glad you do, because we're going to stop first at the\r Vega Colony and replace anybody who needs hospitalisation and also.\r What the devil are you putting in there, ice?, What makes you think I need one?, Shouldn't it be? My own yeoman and two others dead, seven\r injured., Oh, I should have smelled trouble when I saw the swords and the\r armour. Instead of that, I let myself get trapped in that deserted\r fortress and attacked by one of their warriors., You bet I'm tired. You bet. I'm tired of being responsible for\r two hundred and three lives. I'm tired of deciding which mission is too\r risky and which isn't, and who's going on the landing party and who\r doesn't, and who lives and who dies. Boy, I've had it, Phil., To the point of considering resigning., Well, for one thing, go home. Nice little town with fifty miles\r of parkland around it. Remember I told you I had two horses, and we\r used to take some food and ride out all day., I said that's one place I might go. I might go into business on\r Regulus or on the Orion colony., The point is this isn't the only life available. There's a whole\r galaxy of things to choose from., Now you're beginning to talk like a doctor, bartender., Address intercraft., This is the captain. Our destination is the Talos star group. Our\r time warp, factor seven., Engage., Yeoman., I thought I told you that when I'm on the bridge, Oh, I see. Thank you., She does a good job, all right. It's just that I can't get used\r to having a woman on the bridge. No offence, Lieutenant. You're\r different, of course., Spectography?, Gravity?, Prep a landing party of six. You feel up to it?, Sorry, Number One. With little information on this planet, we'll\r have to leave the ship's most experienced officer here covering us., There's no indication of problems down there,\r but let's not take chances., Captain Christopher Pike, United Space Ship Enterprise., The same old Earth, and you'll see it very soon., Enterprise., We'll begin transporting the survivors and their effects up to\r you very shortly., That's affirmative on the, Er, affirmative on request. Landing party out., I don't understand., Can you hear me? My name is Christopher Pike, commander of the\r space vehicle Enterprise from a stellar group at the other end of this\r galaxy. Our intentions are peaceful. Can you understand me?, You're not speaking, yet I can hear you., All right then, telepathy. You can read my mind. I can read\r yours. Now, unless you want my ship to consider capturing me an\r unfriendly act, If you were in here, wouldn't you test the strength of these\r walls, too? There's a way out of any cage, and I'll find it., This is Rigel Seven., I was in a cage, a cell, in some kind of a zoo. I must still be\r there., They've reached into my mind and taken the memory of somewhere\r I've been., It's starting just as it happened two weeks ago. Except for you., Longer hair, different dress, but it is you, the one the\r survivors called Vina. Or rather the image of Vina. But why you again?\r Why didn't they create a different girl?, But it's only a dream., You can tell my jailers I won't go along with it. I'm not an\r animal performing for its supper., Why would an illusion be frightened?, Who are you? You act as if this were really you., Why are you here?, Are you real?, No, no. No, that's not an answer. I've never met you before,\r never even imagined you., What, and dress you in the same metal fabric they wear?, So they can see how their specimen performs? They want to see how\r I react, is that it?, Or do they do more than just watch me? Do they feel with me, too?, Yes. Yes, you can please me. You can tell me about them. Is there\r any way I can keep them from probing my mind, from using my thoughts\r against me? Does that frighten you? Does that mean there is a way?, Since you're not real, there's not much point in continuing this\r conversation, is there., How far can they control my mind?, Perhaps., But they try to trick me with their illusions., Did they ever live on the surface of this planet? Why did they go\r underground?, That's why it's so barren up there?, So the Talosians who came underground found life limited here and\r they concentrated on developing their mental power., Or sit probing minds of zoo specimens like me., Which means they had to have more than one of each animal., They'll need a pair of humans too. Where do they get intend to\r get the Earth woman?, But that was a bargain with something that didn't exist. You said\r you weren't real, remember?, Is the keeper actually communicating with one of his animals?, And if I prefer, Why not just put irresistible hunger in my mind? Because you\r can't, can you? You do have limitations, don't you?, That's very interesting., You were startled. Weren't you reading my mind then?, No, let's stay on the first subject. All I wanted for that moment\r was to get my hands around your neck., Do primitive thoughts put up a block you can't read through?, All right, all right, let's talk about the girl. You seem to be\r going out of your way to make her attractive, to make me feel\r protective., It seems more important to you now that I begin to accept her and\r like her., Assuming that's a lie, why would you want me attracted to her? So\r I'll feel love in a husband-wife relationship? That would be necessary\r only if you intend to build a family group or perhaps a whole human\r community., You mean properly punished! I'm the one who's not co-operating!\r Why don't you punish me?, Tango! You old devil, you. I'm sorry I don't have any sugar.\r Well, they think of everything, don't they?, They read our minds very well. Home, anything else I want, if I\r co-operate, is that it?, Look, I'm sorry they punish you, but we can't let them, It's funny. It's about twenty four hours ago I was telling the\r ship's doctor how much I wanted something else not very different from\r what we have here. An escape from reality. Life with no frustrations.\r No responsibilities. Now that I have it, I understand the doctor's\r answer., You either live life, bruises, skinned knees and all, or you turn\r your back on it and start dying. The doctor's going to be happy about\r one part, at least. He said I needed a rest., I used to ride through here when I was a kid. It's not as pretty\r as some of the parkland around the big cities, but. That's Mojave.\r That's where I was born., These headaches, they'll be hereditary you know. Would you wish\r them on a child or a whole group of children?, Is it? Look, first they made me protect you and then feel\r sympathy for you. Now we have these familiar surroundings and a\r comfortable husband-wife relationship. They don't need all this for\r just passion. What they're after is respect and mutual dependence., But we're not here, neither of us. We're in a menagerie, a cage!, Back in my cage, it seemed for a couple of minutes that our\r keeper couldn't read my thoughts. Do emotions like hate, keeping hate\r in your mind, does that block off our mind from them?, Oh, no. I don't hate you. I can guess what it was like., If they can read my mind, then they know I'm attracted to you., I was from the very first moment\r I saw you in the survivor's camp., They don't work., Don't say anything. I'm filling my mind with a picture of beating\r their huge, misshapen heads to pulp, thoughts so primitive they black\r out everything else. I'm filling my mind with hate., I'll break out of this zoo somehow and get to you. Is your blood\r red like ours? I'm going to find out., All I want to do is get my hands on you. Can you read these\r thoughts? Images of hate, killing?, You'll find my thoughts more interesting. Thoughts so primitive\r you can't understand. Emotions so ugly, No. No, don't help me. I have to concentrate. They can't read\r through hate., Now you hold still, or I'll break your neck., I've had some samples of how good they are., You stop this illusion, or I'll twist your head off.\r All right, now you try one more illusion, you try anything at all, and\r I'll break your neck., I'm going to gamble you're too intelligent to kill for no reason\r at all., ...]",120
4,tos_000,tos_s1,1966,The Cage,GARISON,m,"[It's a radio wave, sir. We're passing through an old-style\r distress signal., A ship in trouble making a forced landing, sir. That's it. No\r other message., Eleven survivors from crash.\r Gravity and oxygen within limits. Food and water obtainable, but\r unless. The message faded at that point, sir., Sir., Could that be an illusion too?, Open, sir., Open., The captain?]",8
5,tos_000,tos_s1,1966,The Cage,BOYCE [OC},,[Boyce here.],1
6,tos_000,tos_s1,1966,The Cage,BOYCE,m,"[I understand we picked up a distress signal., Oh, I concur with yours, definitely., Who wants a warm martini?, Sometimes a man'll tell his bartender things he'll never tell\r his doctor. What's been on your mind, Chris, the fight on Rigel Seven?, Was there anything you personally could have done to prevent it?, Chris, you set standards for yourself no one could meet. You\r treat everyone on board like a human being except yourself, and now\r you're tired and you, To the point of finally taking my advice, a rest leave?, And do what?, Ah, that sounds exciting. Ride out with a picnic lunch every day., You, an Orion trader, dealing in green animal women, slaves?, Not for you. A man either lives life as it happens to him, meets\r it head-on, and licks it, or he turns his back on it and starts to\r wither away., Take your choice. We both get the same two kinds of customers.\r The living and the dying., If they can spare you a moment, I'd like to make my medical\r report., Their health is excellent. Almost too good., It was a perfect illusion. They had us seeing just what we\r wanted to see, human beings who'd survived with dignity and bravery,\r everything entirely logical, right down to the building of the camp,\r the tattered clothing, everything. Now let's be sure we understand the\r danger of this. The inhabitants of this planet can read our minds. They\r can create illusions out of a person's own thoughts, memories, and\r experiences, even out of a person's own desires. Illusions just as real\r and solid as this table top and just as impossible to ignore., Maybe it was. It's what I tried to explain in the briefing room.\r Their power of illusion is so great, we can't be sure of anything we\r do, anything we see., Hold on a minute., You look a hundred percent better., Eve as in Adam?]",19
7,tos_000,tos_s1,1966,The Cage,COLT,f,"[Yes, sir., But you wanted the reports by oh five hundred. It's oh five\r hundred now, sir., We were the only ones transported., Leave him alone., Picked her? For what? I don't understand., Offspring, as in children?, Captain., What's happened to Vina?, Yes, sir., Sir, I was wondering. Just curious. Who would have been Eve?, Yes, ma'am. Yes, sir.]",11
8,tos_000,tos_s1,1966,The Cage,GEOLOGIST,m,"[Geological lab report complete, Captain., Our reading shows an oxygen nitrogen atmosphere, sir, heavy\r with inert elements but well within safety limits., Zero point nine of Earth.]",3
9,tos_000,tos_s1,1966,The Cage,PITCAIRN,m,"[Yes, sir. There's a canyon to the left. We can set you there\r completely unobserved., All systems are out, bridge. We've got nothing., Sir, it just came on. We can't shut the\r power off., Mister Spock, the system is coming on\r again.]",4


In [187]:
tos_episode_tos_000 = tos_dataframe[tos_dataframe['Episode'] == 'tos_000']

In [188]:
tos_episode_tos_000.shape

(21, 8)