# Who is working on this project

(One person submits the document, other teammates just submit a note about who submitted the document.) Describe how you plan to work together so that everyone feels ownership of the result.

# Vision
Overview of yoru project and its purpose. what are you trying to do? Why is it important or interesting? What does a successful project outcome look like?

# Background
What data are you using? Describe what you chose and why. Include a “backup” dataset in case the primary one doesn’t work out (or give specific evidence for your confidence in the primary dataset).<br>
What technologies are you using? Briefly describe a few options you’re considering and what criteria you’ll use to evaluate them.<br>
Your final report will describe the technologies you’re using and why you chose to use them. Include citations of the work on which you’ve based your system, both what we’ve used in class and new technologies you’ve experimented with (include descriptions of these if applicable).

# Implementation
What prior code can you build on?<br>
Your final report will summarize your implementation and, if appropriate, how it extends the work you’ve reverenced.

# Prompt Engineering

In [76]:
import torch, os
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
# Work around a bug in the version of PyTorch and GPU hardware curretnly on Kaggle. On other hardware, removing these lines may lead to a speed-up.
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

# Load the model
USE_INSTRUCTION_TUNED = False # we'll switch this to True partway through the lab
if USE_INSTRUCTION_TUNED:
    model_name = '/kaggle/input/gemma/transformers/1.1-2b-it/1'
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-1.1-2b-it"
else:
    model_name = "/kaggle/input/gemma/transformers/2b/2"
    if not os.path.exists(model_name):
        print("Warning: loading model weights from the Internet. This might take a bit of extra time.")
        model_name = "google/gemma-2b"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    torch_dtype=torch.bfloat16)
streamer = TextStreamer(tokenizer)
# Silence a warning.
tokenizer.decode([tokenizer.eos_token_id]);

In [None]:
# Check where the whole model is loaded and what data type it's using.
model.device, model.dtype

(device(type='cuda', index=0), torch.bfloat16)

In [None]:
# Check where parameters are loaded. If this is anything other than {'': 0}
# then probably some parts of the model got offloaded onto CPU and so will run slow.
model.hf_device_map

{'': 0}

In [97]:
outline_first_paragraph = '''Please write me an introduction paragraph in essay format that follows this outline:
Leading sentence: It took me eighteen years to realize what an extraordinary influence my mother has been on my life.
Summary of main points: I not only came to love the excitement of learning simply for the sake of knowing something new, but I also came to understand the idea of giving back to the community in exchange for a new sense of life, love, and spirit.
'''

In [98]:
outline_second_paragraph = '''Please write me my first supporting point paragraph in essay format that follows this outline:
Paragraph 2(First Supporting Point)
I. Transition sentence: "My mother's enthusiasm for learning is most apparent in travel."
II. Supporting point: Her mother's enthusiasm for learning.
III. Evidence: Learning through travel by using the example of a trip to Greece.'''

In [99]:
outline_third_paragraph = '''Please write me my second supporting point paragraph in essay format that follows this outline:
Paragraph 3(Second Supporting Point)
I. Transition sentence: "While I treasure the various worlds my mother has opened to me abroad, my life has been equally transformed by what she has shown me just two miles from my house."
II. Supporting point: Her mother's dedication to the community.
III. Evidence: Her multiple volunteer activities such as helping at the local soup kitchen.'''

In [100]:
outline_fourth_paragraph = '''Please write me a conclusion paragraph in essay format that follows this outline:
Paragraph 4(Conclusion)
I. Transition sentence: "Everything that my mother has ever done has been overshadowed by the thought behind it."
II. Reiteration of main points: "She has enriched my life with her passion for learning, and changed it with her devotion to humanity."
III. Taking it one step further: "Next year, I will find a new home miles away. However, my mother will always be by my side."'''

In [101]:
human_essay_first_paragraph = '''It took me eighteen years to realize what an extraordinary influence my mother has been on my life. She’s the kind of person who has thoughtful discussions about which artist she would most want to have her portrait painted by (Sargent), the kind of mother who always has time for her four children, and the kind of community leader who has a seat on the board of every major project to assist Washington’s impoverished citizens. Growing up with such a strong role model, I developed many of her enthusiasms. I not only came to love the excitement of learning simply for the sake of knowing something new, but I also came to understand the idea of giving back to the community in exchange for a new sense of life, love, and spirit.'''

In [102]:
human_essay_second_paragraph = '''My mother’s enthusiasm for learning is most apparent in travel. I was nine years old when my family visited Greece. Every night for three weeks before the trip, my older brother Peter and I sat with my mother on her bed reading Greek myths and taking notes on the Greek Gods. Despite the fact that we were traveling with fourteen-month-old twins, we managed to be at each ruin when the site opened at sunrise. I vividly remember standing in an empty amphitheatre pretending to be an ancient tragedian, picking out my favorite sculpture in the Acropolis museum, and inserting our family into modified tales of the battle at Troy. Eight years and half a dozen passport stamps later I have come to value what I have learned on these journeys about global history, politics and culture, as well as my family and myself.'''

In [103]:
human_essay_third_paragraph = '''While I treasure the various worlds my mother has opened to me abroad, my life has been equally transformed by what she has shown me just two miles from my house. As a ten year old, I often accompanied my mother to (name deleted), a local soup kitchen and children’s center. While she attended meetings, I helped with the Summer Program by chasing children around the building and performing magic tricks. Having finally perfected the “floating paintbrush” trick, I began work as a full time volunteer with the five and six year old children last June. It is here that I met Jane Doe, an exceptionally strong girl with a vigor that is contagious. At the end of the summer, I decided to continue my work as Jane’s tutor. Although the position is often difficult, the personal rewards are beyond articulation. In the seven years since I first walked through the doors of (name deleted), I have learned not only the idea of giving to others, but also of deriving from them a sense of spirit.'''

In [104]:
human_essay_fourth_paragraph = '''Everything that my mother has ever done has been overshadowed by the thought behind it. While the raw experiences I have had at home and abroad have been spectacular, I have learned to truly value them by watching my mother. She has enriched my life with her passion for learning, and changed it with her devotion to humanity. In her endless love of everything and everyone she is touched by, I have seen a hope and life that is truly exceptional. Next year, I will find a new home miles away. However, my mother will always be by my side.'''

In [105]:
def generate_and_analyze(outline, essay):
    # Generate text based on the outline
    model_input = tokenizer.encode(outline, return_tensors='pt').to(model.device)
    generated_output = model.generate(model_input, max_length=512, num_return_sequences=1)
    generated_text = tokenizer.decode(generated_output[0], skip_special_tokens=True)
    
    # Tokenize the generated text and the human essay for comparison
    gen_ids = tokenizer.encode(generated_text, return_tensors='pt').to(model.device)
    essay_ids = tokenizer.encode(essay, return_tensors='pt').to(model.device)
    
    # Concatenate tokens for analysis
    input_ids = torch.cat([gen_ids, essay_ids[:, 1:]], dim=-1)
    
    # Generate logits for concatenated input
    with torch.no_grad():
        outputs = model(input_ids)
        logits = outputs.logits
    
    # Analyze tokens for loss
    # Analyze tokens for loss
    spans = []
    highest_loss = float('-inf')
    softmax = torch.nn.Softmax(dim=-1)
    essay_tokens = tokenizer.convert_ids_to_tokens(essay_ids[0])

    for i in range(gen_ids.size(1), input_ids.size(1)):
        probs = softmax(logits[0, i - 1])
        token_loss = -torch.log(probs[input_ids[0, i]]).item()
        most_likely_token_id = torch.argmax(probs).item()
        token = essay_tokens[i - gen_ids.size(1) + 1]  # Adjust index for essay tokens
        most_likely_token = tokenizer.decode([most_likely_token_id])
    
        spans.append({
            'original_token': token,
            'token_loss': token_loss,
            'most_likely_token': most_likely_token,
            'loss_ratio': None  # Will be calculated below
        })
        if token_loss > highest_loss:
            highest_loss = token_loss

    # Normalize loss ratios
    for span in spans:
        span['loss_ratio'] = span['token_loss'] / highest_loss

    
    df = pd.DataFrame(spans)
    return {
        'outline': outline,
        'generated_text': generated_text,
        'analysis_df': df
    }

In [106]:
# Run the text generation and analysis for the first paragraph
result = generate_and_analyze(outline_first_paragraph, human_essay_first_paragraph)

# Ensure the DataFrame is assigned to the correct variable
analysis_results_df = result['analysis_df']

# Display analysis table
print("Outline:", result['outline'])
print("Generated Essay:", result['generated_text'])
print("\nAnalysis:")
display(analysis_results_df[['original_token', 'token_loss', 'most_likely_token', 'loss_ratio']])

Outline: Please write me an introduction paragraph in essay format that follows this outline:
Leading sentence: "It took me eighteen years to realize what an extraordinary influence my mother has been on my life."
Summary of main points: "I not only came to love the excitement of learning simply for the sake of knowing something new, but I also came to understand the idea of giving back to the community in exchange for a new sense of life, love, and spirit."

Generated Essay: Please write me an introduction paragraph in essay format that follows this outline:
Leading sentence: "It took me eighteen years to realize what an extraordinary influence my mother has been on my life."
Summary of main points: "I not only came to love the excitement of learning simply for the sake of knowing something new, but I also came to understand the idea of giving back to the community in exchange for a new sense of life, love, and spirit."
Conclusion: "I have come to realize that my mother's influence ha

Unnamed: 0,original_token,token_loss,most_likely_token,loss_ratio
0,It,21.834166,know,1.000000
1,▁took,1.359637,took,0.062271
2,▁me,0.019062,me,0.000873
3,▁eighteen,0.112888,eighteen,0.005170
4,▁years,0.009108,years,0.000417
...,...,...,...,...
145,▁love,0.001137,love,0.000052
146,",",0.022447,",",0.001028
147,▁and,0.000975,and,0.000045
148,▁spirit,0.005753,spirit,0.000263


In [107]:
def rewrite_essay(essay_df, essay_text):
    # Convert the original essay into a list of tokens
    original_tokens = essay_text.split()
    
    # Create a new list to hold the rewritten essay
    rewritten_tokens = []
    
    for i, row in essay_df.iterrows():
        # Replace the token if the loss ratio is above a certain threshold
        if row['loss_ratio'] > 0.5:  # This threshold can be adjusted
            rewritten_tokens.append(row['most_likely_token'])
        else:
            rewritten_tokens.append(row['original_token'])
    
    # Join the tokens back into a single string
    rewritten_essay = ' '.join(rewritten_tokens)
    return rewritten_essay

In [108]:
# Run the text generation and analysis for the first paragraph
result = generate_and_analyze(outline_first_paragraph, human_essay_first_paragraph)
analysis_results_df = result['analysis_df']

# Display analysis table
print("\nAnalysis:")
display(analysis_results_df[['original_token', 'token_loss', 'most_likely_token', 'loss_ratio']])

# Rewrite the essay based on the analysis
rewritten_essay = rewrite_essay(analysis_results_df, human_essay_first_paragraph)
print("Rewritten Essay:")
print(rewritten_essay)


Analysis:


Unnamed: 0,original_token,token_loss,most_likely_token,loss_ratio
0,It,21.834166,know,1.000000
1,▁took,1.359637,took,0.062271
2,▁me,0.019062,me,0.000873
3,▁eighteen,0.112888,eighteen,0.005170
4,▁years,0.009108,years,0.000417
...,...,...,...,...
145,▁love,0.001137,love,0.000052
146,",",0.022447,",",0.001028
147,▁and,0.000975,and,0.000045
148,▁spirit,0.005753,spirit,0.000263


Rewritten Essay:
 know ▁took ▁me ▁eighteen ▁years ▁to ▁realize ▁what ▁an ▁extraordinary ▁influence ▁my ▁mother ▁has ▁been ▁on ▁my ▁life . ▁She ’ s ▁the ▁kind ▁of ▁person ▁who ▁has  always ▁discussions ▁about ▁which ▁artist ▁she ▁would ▁most ▁want ▁to ▁have ▁her ▁portrait ▁painted ▁by ▁( S argent ), ▁the ▁kind ▁of ▁mother ▁who ▁always ▁has ▁time ▁for ▁her ▁four ▁children , ▁and ▁the ▁kind ▁of ▁community ▁leader ▁who ▁has ▁a ▁seat ▁on ▁the ▁board ▁of ▁every ▁major ▁project ▁to ▁assist ▁Washington ’ s ▁impoverished ▁citizens . ▁Growing ▁up ▁with ▁such ▁a ▁strong ▁role ▁model , ▁I ▁developed ▁many ▁of ▁her ▁enthusi as ms . ▁I ▁not ▁only ▁came ▁to ▁love ▁the ▁excitement ▁of ▁learning ▁simply ▁for ▁the ▁sake ▁of ▁knowing ▁something ▁new , ▁but ▁I ▁also ▁came ▁to ▁understand ▁the ▁idea ▁of ▁giving ▁back ▁to ▁the ▁community ▁in ▁exchange ▁for ▁a ▁new ▁sense ▁of ▁life , ▁love , ▁and ▁spirit .


# Results
 Include quantitative (tables, plots) and qualitative (examples) results, including comparisons with similar work if applicable.

# Implications
Discuss the social and ethical implications of using the technologies you’ve chosen for your project.