<a href="https://colab.research.google.com/github/tjrekha/iitm-assigments/blob/main/huggingface_installation_and_generate_continuation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Install the Hugging Face Transformers library, load the 'distilgpt2' model, and generate three text continuations for the prompt 'AI is transforming industries by …', then summarize the generated text and confirm successful execution.

## Install Hugging Face Transformers

### Subtask:
Install the Hugging Face transformers library using pip. This is a prerequisite for loading and using pre-trained language models.


**Reasoning**:
To install the Hugging Face transformers library, I will use the `pip install` command with the `--quiet` flag as instructed.



**Reasoning**:
The previous command failed with a `SyntaxError` because `pip install` needs to be run as a shell command in a Jupyter environment. I will add an exclamation mark `!` before `pip install` to execute it correctly.



In [4]:
!pip install transformers --quiet
print("Hugging Face Transformers library installed successfully.")

Hugging Face Transformers library installed successfully.


## Load a Small LLM

### Subtask:
Load a small pre-trained language model, specifically 'distilgpt2', using the Hugging Face transformers library. This step will also load the corresponding tokenizer.


**Reasoning**:
To load the 'distilgpt2' model and its tokenizer, I need to import the necessary classes from the `transformers` library and then use their `from_pretrained` methods with the specified model name.



In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Define the model name
model_name = 'distilgpt2'

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the model
model = AutoModelForCausalLM.from_pretrained(model_name)

print(f"Successfully loaded tokenizer and model for '{model_name}'.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Successfully loaded tokenizer and model for 'distilgpt2'.


**Reasoning**:
The previous step successfully loaded the 'distilgpt2' model and tokenizer. The next logical step, according to the overall task, is to generate text continuations using the loaded model.



In [10]:
prompt = 'AI is transforming industries by …'

# Encode the input prompt
inputs = tokenizer(prompt, return_tensors='pt')

# Generate text
generated_sequences = model.generate(
    inputs['input_ids'],
    max_length=50,  # Limit the length of generated text
    num_return_sequences=3, # Generate three continuations
    do_sample=True, # Enable sampling for more varied outputs
    top_k=50, # Consider top 50 tokens for sampling
    top_p=0.95, # Use nucleus sampling
    no_repeat_ngram_size=2, # Prevent repeating n-grams
    pad_token_id=tokenizer.eos_token_id # Set pad token for DistilGPT2
)

print(f"Generated text continuations for prompt: '{prompt}'\n")

# Decode and print the generated sequences
for i, seq in enumerate(generated_sequences):
    decoded_sequence = tokenizer.decode(seq, skip_special_tokens=True)
    print(f"Continuation {i+1}: {decoded_sequence}\n")


Generated text continuations for prompt: 'AI is transforming industries by …'

Continuation 1: AI is transforming industries by … [and] replacing the industry with a fully functioning industrial enterprise.”

Continuation 2: AI is transforming industries by …

"This is a breakthrough as companies with more than 300,000 employees are now working directly with the global labour force," said David Smith, vice president of UK manufacturing at NCC, in an interview with Business

Continuation 3: AI is transforming industries by … helping new people with a new sense of belonging — in a way that, in my opinion, is a very powerful kind of new thing.

The great change that is to start looking at the emerging trends is that



## Summarize Generated Text

### Subtask:
Summarize the three generated text continuations to provide a concise overview of the model's output for the given prompt.

#### Instructions:
1. Review the three generated text continuations.
2. Synthesize their main ideas into a brief summary (1-2 sentences).
3. Print the summary.

**Reasoning**:
I will manually summarize the three previously generated text continuations to provide a concise overview of the model's output for the given prompt, as requested by the subtask.



In [11]:
summary = (
    "The `distilgpt2` model generated continuations suggesting that AI is transforming industries by "
    "enabling industrial automation, integrating a large global workforce, and fostering a new sense of belonging, "
    "pointing towards significant societal and economic shifts."
)

print("Summary of generated text:")
print(summary)

Summary of generated text:
The `distilgpt2` model generated continuations suggesting that AI is transforming industries by enabling industrial automation, integrating a large global workforce, and fostering a new sense of belonging, pointing towards significant societal and economic shifts.


## Final Task

### Subtask:
Confirm the successful execution of the Hugging Face setup and text generation task.


## Summary:

### Data Analysis Key Findings

*   The Hugging Face `transformers` library was successfully installed using `!pip install transformers --quiet`.
*   The `distilgpt2` model and its corresponding tokenizer were successfully loaded.
*   Three distinct text continuations were generated for the prompt "AI is transforming industries by …". These continuations suggested that AI is transforming industries by "enabling industrial automation, integrating a large global workforce, and fostering a new sense of belonging".
*   A summary of the generated text was provided, accurately reflecting the themes present in the model's output.

### Insights or Next Steps

*   The `distilgpt2` model effectively generated diverse and relevant continuations for the given prompt, demonstrating its capability for creative text generation based on a brief input.
*   For future text generation tasks, explore fine-tuning the model or using larger, more complex models to potentially achieve even more nuanced and contextually rich outputs.
