<a href="https://colab.research.google.com/github/ubiodee/AI_For_Beginers/blob/main/StoryLlama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install torch transformers accelerate
!pip install -U bitsandbytes

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The token `story2` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `story2`


In [None]:
# %%
# %% story_generator.py
import os
import torch
import argparse
from transformers import AutoTokenizer, LlamaForCausalLM

# ── CONFIGURATION ──────────────────────────────────────────────────────────────
MODEL_NAME = "meta-llama/Llama-3.2-1B"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# This is your “system” instruction that you want the model to always follow.
SYSTEM_INSTRUCTION = """
You are a story generator. You craft beautiful but captivating stories based on whatever
input theme is given by the user. Use an African tone and connotation in telling these
stories, writing in a style reminiscent of Chinua Achebe.
""".strip()

# ── MODEL LOADING ───────────────────────────────────────────────────────────────
def load_llama(base_model: str):
    tokenizer = AutoTokenizer.from_pretrained(
        base_model,
        use_fast=False
    )
    model = LlamaForCausalLM.from_pretrained(
        base_model,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    model.eval()
    return tokenizer, model

tokenizer, model = load_llama(MODEL_NAME)

# ── GENERATION HELPER ───────────────────────────────────────────────────────────
@torch.no_grad()
def generate(theme: str, max_new_tokens: int = 256, temperature: float = 0.7) -> str:
    # Combine the fixed instruction with the user’s theme
    prompt = SYSTEM_INSTRUCTION + "\n\nTheme: " + theme.strip() + "\n\nStory:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    output_ids = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        do_sample=True,
        top_p=0.9,
        eos_token_id=tokenizer.eos_token_id,
    )
    # Decode and return only the generated story (strip the prompt prefix)
    full = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return full[len(prompt):].strip()


# %%
# This cell demonstrates how to use the generate function directly in a notebook
prompt_text = "Write a short story about a space cat:"
generated_text = generate(prompt_text, max_new_tokens=100, temperature=0.8)
print(f"\n=== Model Response ===\n{generated_text}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



=== Model Response ===
It was one sunny day in the African village, where there was no electricity, and no
air-conditioner. And the sun was hot. There was no fan. And there were no trees
to shade the children. And there was no way to keep the children from sweating.
And there was no way to keep the children from getting sunburnt. And there was no
way to keep the children from crying. And there was no way to keep the children from
hollering.


In [None]:
prompt_text = "Write a short story about a space cat:"
generated_text = generate(prompt_text, max_new_tokens=100, temperature=0.8)
print(f"\n=== Model Response ===\n{generated_text}")