# Prompt, Outline/Essay -(Put Through Model)-> Genereated Outline --> Human Edited Outline of G.O. --> Revised Essay Per Token

In [1]:
import torch, os
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Work around a bug in the version of PyTorch and GPU hardware curretnly on Kaggle. On other hardware, removing these lines may lead to a speed-up.
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

# Load the model
USE_INSTRUCTION_TUNED = True # we'll switch this to True partway through the lab
if USE_INSTRUCTION_TUNED:
    model_name = '/kaggle/input/gemma/transformers/1.1-2b-it/1'
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-1.1-2b-it"
else:
    model_name = "/kaggle/input/gemma/transformers/2b/2"
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    torch_dtype=torch.bfloat16)
streamer = TextStreamer(tokenizer)
# Silence a warning.
tokenizer.decode([tokenizer.eos_token_id]);

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2024-04-26 03:18:49.218611: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-26 03:18:49.218707: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-26 03:18:49.354917: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [2]:
# Check where the whole model is loaded and what data type it's using.
model.device, model.dtype

(device(type='cuda', index=0), torch.bfloat16)

In [3]:
# Check where parameters are loaded. If this is anything other than {'': 0}
# then probably some parts of the model got offloaded onto CPU and so will run slow.
model.hf_device_map

{'': 0}

# Essay Components

## Prompt

In [4]:
essay_prompt = '''In Homegoing, Pachinko, and Stories We Tell, the authors/filmmaker choose deliberate artistic strategies to present the histories they narrate. Discuss how the literary/filmic choices of Gyasi, Lee, and Polley are part of their overall theory about how the past should be constructed. In other words, how does the way in which they present their stories intersect with what they are trying to say in their stories? Are there commonalities in their aims? If so, what are they? Are there critical differences?'''

## Outline given by the User

In [5]:
human_outline_first_paragraph = '''Please write me an introduction paragraph in essay format that follows this outline:
Thesis: Authors and filmmaker use multiple perspectives to illustrate how personal and ancestral choices shape individual narratives and identities, demonstrating their theories on how history should be constructed.
Brief introduction of works and creators: Discusses Homegoing by Yaa Gyasi, Pachinko by Min Jin Lee, and Stories We Tell by Sarah Polley, setting the stage for an exploration of their narrative techniques.
'''

In [6]:
human_outline_second_paragraph = '''Please write me my first supporting paragraph about personal choices shaping narratives in essay format that follows this outline:
Main idea: Personal decisions directly shape characters' identities and futures in all three works, highlighting the authors' and filmmaker's focus on the impact of individual agency within broader historical and social contexts.
Explanation: This exploration of personal choice aligns with the creators' views that history is not merely a series of events but a complex tapestry woven from individual actions and their consequences.
Example from Pachinko: Sunja's pivotal decision to engage with Hansu, and its ramifications, showcase how personal mistakes and moral dilemmas are central to character development and plot progression.
Example from Homegoing: The character H’s experience with forced labor and subsequent physical development illustrates how personal endurance and adaptation to circumstances reflect broader historical forces like slavery and institutional racism.
Example from Stories We Tell: Harry’s narrative and his one-sided love affair demonstrate how personal perceptions can deeply influence one’s identity and the stories they choose to tell or believe.'''

In [7]:
human_outline_third_paragraph = '''Please write me my second supporting paragraph about influence of others and ancestors in essay format that follows this outline:
Main idea: Characters’ identities and life paths are significantly influenced by the actions and statuses of those around them and their ancestors, emphasizing the interconnectedness of personal histories within larger societal narratives.
Explanation: This theme underscores the authors' and filmmaker's perspective that individual lives are not isolated but are deeply affected by the historical and relational contexts in which they exist, echoing a broader theory that history is constructed collectively rather than singularly.
Example from Pachinko: Isak’s altruistic decision to marry Sunja provides a stark contrast to her initial dilemma, showing how benevolent actions from others can redirect an individual’s life trajectory dramatically.
Example from Homegoing: The legacy of slavery, as seen through Esi and her descendant Ness, highlights how ancestral histories cast long shadows over the lives of future generations, shaping identities and opportunities long after the original events have passed.
Example from Stories We Tell: The revelation of Sarah’s true paternity and Diane’s decisions regarding her upbringing illustrate the profound impact parental choices have on children’s identities and their understanding of family narratives.'''

## Human Written Essay

In [8]:
human_essay_first_paragraph = '''Many people believe that there is only one story going on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small side character. We often don't think that there are billions of different stories happening all at the same time, which can then affect our narratives through their perspectives. The authors and filmmakers of Homegoing, Pachinko, and Stories We Tell, use the perspective of many people in similar situations to demonstrate how the narratives of people’s lives can be shaped and defined by the choices they make personally, and the people around them including their ancestors. '''

In [9]:
human_essay_second_paragraph = '''People’s identities are often defined by the choices they made in the past, whether they are successful or unsuccessful, honorable or dishonorable. The authors and filmmakers of these stories demonstrate how a person’s choices can shape their narratives. In the novel, Pachinko by Min Jin Lee, we mainly see the story through the viewpoint of Sunja, but we will occasionally get to look through or see the thoughts of the people close to her. In the novel, it states, “If he did not marry her, she was a common slut who would be disgraced forever. The child would be another no-name bastard. Her mother’s boardinghouse would be contaminated by her shame”(Lee 49). Sunja made a big mistake by being with Hansu, and she paid the price. She was now pregnant, and the father of the child won’t be able to marry her because he is already married. In her society, bearing a child without a father will lead to having the mother, in this case, Sunja, be disowned. It could also lead to her family and her child being disowned as well. Looking through the lens of Sunja, we could see how devastated she was and the impact her decisions could make on her in the future. Also, in the novel, Homegoing by Yaa Gyasi, we see through many different lenses in many different generations. In the chapter about H, it states, “The boss man was called Mr. John. He asked to take off his shirt. He inspected the muscles on his back and his arms and whistled. ‘Any man what can spend ten years working at Rock Slope and live to tell about it’s worth watching”(Gyasi 169). When H was sent to me while he was arrested, he might not have had an option on whether or not he wanted to work, but it made him better in the long run. He had gotten physically strong, and when he had gotten released, he was able to find a mining job that paid him. The situation he was put in shaped the person that H was. He became a hard worker and he made sure he did his job. Those were some of the things that he learned while he mined for the jail, but he was able to apply those lessons to the real world and shaped him to be somebody that was hirable. Another example of identities being defined by self choices is in the documentary, Stories We Tell, produced by Sarah Polley. When Harry met with Diane when she traveled to Montreal for a play, Harry fell in love with her. He had his own one-sided story, where he thought he was going to be with Diane for the rest of his life. This ultimately shaped his identity for the future. At the end of the documentary, Sarah asked Harry whether he liked that many people are sharing their sides of the story or if he disliked it. He shared that he did not like how many people shared their sides because he thought it was only his story to tell. He got wrapped up in the sense that Diane only loved him, and that changed the way he viewed his story.'''

In [10]:
human_essay_third_paragraph = '''While some believe that their identity is solely based on their choices, people’s identities can be altered by the people around them and by their ancestors. When we start looking through the other lenses of people in the same situation, we can see that many times the expected outcome is different from reality. In the novel, Pachinko, it states, “‘Of course it would be far better for them if she went away’ Yangjin replied, knowing the hard truth. ‘The child would have a terrible life here. You’d be saving my daughter’s life as well'”(Lee 74). Isak knew the dilemma Sunja was in and he knew this was something that shouldn’t be taken lightly. After coming up with the idea of marrying Sunja and giving the child his last name, he had lifted the burden off Sunja and her child’s back. They would no longer have to bear the weight of being dishonored by society and her family. Having people who can have a different perspective of the matter, can allow one to alter the course of somebody. Then, in the novel, Homegoing, we were able to witness the identities being altered because of their past relatives. In the novel, it states, “Every day, Ness picked cotton under the punishing eye of the southern sun. She had been at Thomas Allan Stockham’s Alabama plantation for three months”(Gyasi 70). Ness was a descendant of Esi in this story. Esi was a part of the slave trade system and her life was very rough. Esi went through an enormous amount of abuse and unbearable situations. Since being in that slave system, it had affected the rest of her family tree. Ness is the daughter of Esi, and so, Ness had to bear the identity of being a slave just like her mother. She had to be a slave on a plantation and no choice she made would be able to change that in this system. Her identity at the time was determined by her mother and how she was a part of the system. Lastly, in the documentary, Stories We Tell, Sarah’s identity was shaped by the choices her mother made when she was alive. Sarah never knew that Michael was not her biological father until she grew up. Sarah was able to form a strong bond with Michael, and that was because Diane decided to have Michael take care of her instead of Harry. Once Sarah found out that Harry was her father, Sarah was still able to keep the close relationship with Michael even after finding out the truth. Sarah was able to keep her identity from the one she formed with Michael, and that was all due to Diane.'''

# Compute loss of each token(word by word) of the Original Human Generated Essay

In [11]:
import torch
import pandas as pd

def analyze_text(essay):
    # Tokenize the essay for comparison
    essay_ids = tokenizer.encode(essay, return_tensors='pt').to(model.device)
    
    # Generate logits for the essay input
    with torch.no_grad():
        outputs = model(essay_ids)
        logits = outputs.logits
    
    # Analyze tokens for loss
    spans = []
    highest_loss = float('-inf')
    softmax = torch.nn.Softmax(dim=-1)
    essay_tokens = tokenizer.convert_ids_to_tokens(essay_ids[0])
    
    for i in range(1, essay_ids.size(1)):  # Start from 1 to skip the first token (usually [CLS] or similar)
        probs = softmax(logits[0, i - 1])
        token_loss = -torch.log(probs[essay_ids[0, i]]).item()
        most_likely_token_id = torch.argmax(probs).item()
        token = essay_tokens[i]  # Adjust index for essay tokens
        most_likely_token = tokenizer.decode([most_likely_token_id])
        
        spans.append({
            'original_token': token,
            'token_loss': token_loss,
            'most_likely_token': most_likely_token,
            'loss_ratio': None  # To be calculated later
        })
        
        if token_loss > highest_loss:
            highest_loss = token_loss

    # Normalize loss ratios
    for span in spans:
        span['loss_ratio'] = span['token_loss'] / highest_loss if highest_loss != 0 else 0

    df = pd.DataFrame(spans)
    return df


In [12]:
essay = human_essay_first_paragraph
analysis_df_w_no_context = analyze_text(essay)
display(analysis_df_w_no_context[['original_token', 'token_loss', 'most_likely_token', 'loss_ratio']])

Unnamed: 0,original_token,token_loss,most_likely_token,loss_ratio
0,Many,76.802307,increa,1.000000
1,▁people,1.230443,people,0.016021
2,▁believe,1.468063,believe,0.019115
3,▁that,0.070375,that,0.000916
4,▁there,4.952903,the,0.064489
...,...,...,...,...
125,▁including,15.139212,.,0.197119
126,▁their,1.828609,the,0.023809
127,▁ancestors,5.650682,families,0.073574
128,.,0.541190,.,0.007047


# Generate an Outline(Human Essay Given)

## Prompt Engineering

In [13]:
%%time
doc = f'''Please break this introduction paragraph into a detailed outline format focusing on the thesis statement and a brief introduction: {human_essay_first_paragraph} '''
model_out = model.generate(
    **tokenizer(doc, return_tensors='pt').to(model.device),
    max_new_tokens=128,
    do_sample=False,
    streamer=streamer
)

<bos>Please break this introduction paragraph into a detailed outline format focusing on the thesis statement and a brief introduction: Many people believe that there is only one story going on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small side character. We often don't think that there are billions of different stories happening all at the same time, which can then affect our narratives through their perspectives. The authors and filmmakers of Homegoing, Pachinko, and Stories We Tell, use the perspective of many people in similar situations to demonstrate how the narratives of people’s lives can be shaped and defined by the choices they make personally, and the people around them including their ancestors.  

**Thesis Statement:** This paper argues that the multiplicity of stories in the world reflects the richness and complexity of human experience, and that understanding the narrat

### Generate Essay Based on Model Generated Outline

In [14]:
# Example code snippet for generating an introduction paragraph from an outline
doc = """
Please convert the following outline into a well-structured academic introduction paragraph. The paragraph should integrate all the points cohesively, providing a clear and engaging introduction to the topic.

Thesis Statement: This paper argues that the multiplicity of stories in the world reflects the richness and complexity of human experience, and that understanding the narratives of others can provide valuable insights into our own lives and the world around us.

Outline:
- Briefly discuss the common belief of a singular narrative in the world.
- Highlight the underestimation of the multiplicity of stories.
- Introduce the authors and films discussed in the paper.
"""

model_out = model.generate(
    **tokenizer(doc, return_tensors='pt').to(model.device),
    max_new_tokens=128,
    do_sample=False,
    streamer=streamer
)

# Print out the generated paragraph to review
#generated_paragraph = tokenizer.decode(model_out[0], skip_special_tokens=True)
#print(generated_paragraph)

<bos>
Please convert the following outline into a well-structured academic introduction paragraph. The paragraph should integrate all the points cohesively, providing a clear and engaging introduction to the topic.

Thesis Statement: This paper argues that the multiplicity of stories in the world reflects the richness and complexity of human experience, and that understanding the narratives of others can provide valuable insights into our own lives and the world around us.

Outline:
- Briefly discuss the common belief of a singular narrative in the world.
- Highlight the underestimation of the multiplicity of stories.
- Introduce the authors and films discussed in the paper.
- Explain the significance of understanding narratives in personal and collective contexts.

**Introduction:**

The tapestry of human experience is woven with an intricate multiplicity of stories, each thread contributing to the richness and complexity of our understanding. While the singular narrative often holds 

In [15]:
generated_intro_paragraph_models_outline = '''
The tapestry of human experience is woven with an intricate multiplicity of stories, each thread contributing to the richness and complexity of our understanding. While the singular narrative often holds prominence in popular culture and academic discourse, neglecting the multiplicity of stories diminishes the depth and nuance of human experience. This paper argues that the richness of human experience lies precisely in the multiplicity of narratives, and that understanding the narratives of others can provide invaluable insights into our own lives and the world around us.
'''
analysis_df_w_models_outline = analyze_text(generated_intro_paragraph_models_outline)
display(analysis_df_w_models_outline[['original_token', 'token_loss', 'most_likely_token', 'loss_ratio']])

Unnamed: 0,original_token,token_loss,most_likely_token,loss_ratio
0,\n,73.802307,increa,1.000000e+00
1,The,2.327003,**,3.153022e-02
2,▁tapestry,14.386633,provided,1.949347e-01
3,▁of,0.477380,of,6.468360e-03
4,▁human,1.958499,life,2.653710e-02
...,...,...,...,...
89,▁world,0.254916,world,3.454039e-03
90,▁around,0.029104,around,3.943575e-04
91,▁us,0.000016,us,2.148302e-07
92,.,0.000020,.,2.665191e-07


# Model Genereated Outline(Human Generated Outline Given)

In [16]:
role = """You are a college English professor. Help a student organize their ideas into a clear and effective essay outline."""
task = """Create an outline for the provided essay paragraph which discusses the role of personal and ancestral choices in shaping narratives and identities."""

messages = [
    {
        "role": "user",
        "content": f"{role}\n\n{task}\n\n{human_outline_first_paragraph}",
    },
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.batch_decode(tokenized_chat)[0])

<bos><start_of_turn>user
You are a college English professor. Help a student organize their ideas into a clear and effective essay outline.

Create an outline for the provided essay paragraph which discusses the role of personal and ancestral choices in shaping narratives and identities.

Please write me an introduction paragraph in essay format that follows this outline:
Thesis: Authors and filmmaker use multiple perspectives to illustrate how personal and ancestral choices shape individual narratives and identities, demonstrating their theories on how history should be constructed.
Brief introduction of works and creators: Discusses Homegoing by Yaa Gyasi, Pachinko by Min Jin Lee, and Stories We Tell by Sarah Polley, setting the stage for an exploration of their narrative techniques.<end_of_turn>
<start_of_turn>model



# Comparing the Two Options Function

In [17]:
# Function to revise the essay based on the most likely tokens suggested by the model
def revise_essay(essay, analysis_df):
    tokens = essay.split()  # Split the original essay into tokens (simple space-based tokenization might not perfectly match tokenizer behavior)
    revised_tokens = tokens.copy()  # Create a copy of the tokens for modification
    
    token_idx_to_replace = {}  # Dictionary to map original token index to most likely token
    
    # Iterate over the dataframe to replace tokens
    for index, row in analysis_df.iterrows():
        original_token = row['original_token']
        most_likely_token = row['most_likely_token']
        token_loss = row['token_loss']
        
        # Find the index of the token in the original tokens list
        # This assumes exact match; handling multiple occurrences or mismatches might require more sophisticated tracking
        try:
            token_index = tokens.index(original_token)
            token_idx_to_replace[token_index] = most_likely_token
        except ValueError:
            # If the token is not found due to differences in tokenization, ignore it
            continue
    
    # Replace tokens in the copied list
    for idx, replacement in token_idx_to_replace.items():
        revised_tokens[idx] = replacement
    
    # Join the revised tokens back into a single string
    revised_essay = ' '.join(revised_tokens)
    return revised_essay

# Generated Introduction Paragraph Given Desired Human Outline

# Generated Introduction Paragraph given Human Written Essay & Human Written Outline

# Generated Introduction Paragraph Given Model's Outline(From Essay)

In [18]:
revised_essay_with_models_outline = revise_essay(essay, analysis_df_w_models_outline)
print("Original Introduction Paragraph:\n", essay)
print("\nRevised Introduction Based on Most Likely Tokens(With Models Outline):\n", revised_essay_with_models_outline)

Original Introduction Paragraph:
 Many people believe that there is only one story going on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small side character. We often don't think that there are billions of different stories happening all at the same time, which can then affect our narratives through their perspectives. The authors and filmmakers of Homegoing, Pachinko, and Stories We Tell, use the perspective of many people in similar situations to demonstrate how the narratives of people’s lives can be shaped and defined by the choices they make personally, and the people around them including their ancestors. 

Revised Introduction Based on Most Likely Tokens(With Models Outline):
 Many people believe that there is only one story going on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small s

# Generated Introduction Given Model's Outline(From Prompt/Generic)

# Generated Introduction Paragraph Based On Most_likely_token(NO CONTEXT)

In [19]:
# Apply the revision function to the original essay
revised_essay = revise_essay(essay, analysis_df_w_no_context)
print("Original Introduction Paragraph:\n", essay)
print("\nRevised Introduction Based on Most Likely Tokens(NO CONTEXT):\n", revised_essay)

Original Introduction Paragraph:
 Many people believe that there is only one story going on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small side character. We often don't think that there are billions of different stories happening all at the same time, which can then affect our narratives through their perspectives. The authors and filmmakers of Homegoing, Pachinko, and Stories We Tell, use the perspective of many people in similar situations to demonstrate how the narratives of people’s lives can be shaped and defined by the choices they make personally, and the people around them including their ancestors. 

Revised Introduction Based on Most Likely Tokens(NO CONTEXT):
  increa people believe that there is only one story  Alone on in the world, and they are the main characters. They believe that everything should revolve around them, and everybody else in this world is a small side c