<a href="https://colab.research.google.com/github/tihbohs/job-portal/blob/main/Interactive_AI_StoryTeller_5_days_Bootcamp_by_DevTown.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---



                                          **AI STORYTELLER BOOTCAMP**

---

Day 1 — Foundations & Gemini basics

Understand Gemini API usage and authentication.

Write clear prompts and generate single-scene short stories.

Run interactive Colab widgets to input prompts and fetch generations.



---



Day 2 — Image captioning → story

Use BLIP (Hugging Face) to produce descriptive image captions.

Convert captions into robust story prompts for Gemini.

Control tone/length via prompt guidance.



---



Day 3 — Multi-image sequencing & coherence

Batch-process multiple images and produce a chapter outline.

Generate multi-chapter stories with coherent POV, tense, and character consistency.

Techniques to order images (semantic vs. file order) for narrative flow.



---



Day 4 — Editing loop, style control & exports

Perform targeted edits: tone shifts, tightening, and summarization with generation prompts.

Produce audio (TTS) .

Simple filtering strategies.



---



Day 5 — Streamlit app & demo

Build a Streamlit app that accepts prompt and/or images, calls Gemini + BLIP, displays results, and offers exports.

Run a local demo and understand safe ways to share (ngrok caveats, env vars).

Package and present the project.



---



---

**Skills & tools mastered**

APIs & SDKs: Gemini (Google GenAI SDK), Hugging Face Transformers (BLIP).

Dev tools: Google Colab (interactive notebooks), Streamlit (rapid UI), basic Python packaging.

NLP prompts: Prompt engineering patterns for outline → draft → revise.

Multimodal: Image captioning → narrative generation; ordering/semantics for multi-image stories.

Output formats: PDF export, text download, and basic TTS (gTTS or Gemini TTS).

Security practices: Using environment variables, avoiding hardcoded keys, minimal content filtering.

# **DAY 01**

In [1]:
# @title
%env GEMINI_API_KEY=**********************************

env: GEMINI_API_KEY=**********************************


In [2]:
!pip install -q transformers pillow google-generativeai

In [3]:
from google import genai
import os
client=genai.Client()

In [4]:
if "GEMINI_API_KEY" not in os.environ:
  print("Please set your Gemini API key in the environment variable GEMINI_API_KEY")
else:
  client=genai.Client()
  MODEL="gemini-2.5-flash"

In [5]:
prompt=input("Enter your Story prompt and press enter:\n")
if prompt.strip()=="":
  print("No prompt entered , Exiting.")
else:
  print(f"Generating story for prompt: {prompt}")
  print("It may take few seconds")
  try:
    resp=client.models.generate_content(model=MODEL,contents=[prompt])
    print("\n----Generated Story----\n")
    print(resp.text)
  except Exception as e:
    print(f"Error occurred while generating story: {e}")

Enter your Story prompt and press enter:
Ram and Sita Love story with 500 length
Generating story for prompt: Ram and Sita Love story with 500 length
It may take few seconds
Error occurred while generating story: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'API key not valid. Please pass a valid API key.', 'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'API_KEY_INVALID', 'domain': 'googleapis.com', 'metadata': {'service': 'generativelanguage.googleapis.com'}}, {'@type': 'type.googleapis.com/google.rpc.LocalizedMessage', 'locale': 'en-US', 'message': 'API key not valid. Please pass a valid API key.'}]}}


# **DAY 02**

In [6]:
!pip install -q transformers pillow google-generativeai timm

In [7]:
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
from google import genai
import os
import io

In [8]:
if "GEMINI_API_KEY" not in os.environ:
  print("Please set your Gemini API key in the environment variable GEMINI_API_KEY")
else:
  client=genai.Client()
  MODEL="gemini-2.5-flash"

In [None]:
processor=BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model=BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/445 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/527 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

In [None]:
from google.colab import files
uploaded=files.upload()

for fn in uploaded.keys():
  image=Image.open(fn).convert('RGB')
  display(image)

In [None]:
inputs=processor(images=image,return_tensors='pt')
out=model.generate(**inputs)

caption=processor.decode(out[0],
skip_special_tokens=True)

print("Caption generated by BLIP: ")
print(caption)

In [None]:
story_prompt=(f"Write a Short story(around 500-700 words) based on this scene description: {caption}")
print(story_prompt)

print("Sending this to Gemini. \n")

response = client.models.generate_content(model=MODEL, contents=story_prompt)
story=response.text
print("\n----Generated Story----\n")
print(story)


In [None]:
with open("generated_story.txt","w")as f:
  f.write(story)

from google.colab import files
files.download("generated_story.txt")

# **DAY 03**

In [None]:
!pip install -q ipywidgets

In [None]:
from google.colab import files
from PIL import Image
import io

uploaded=files.upload()

images=[]
image_names=[]

for name,file in uploaded.items():
  image=Image.open(io.BytesIO(file)).convert('RGB')
  image_names.append(name)
  images.append(image)
  display(image)

In [None]:
from transformers import BlipProcessor, BlipForConditionalGeneration

processor=BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
blip_model=BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")

captions=[]

for img in images:
  inputs=processor(images=img,return_tensors='pt')
  out=blip_model.generate(**inputs,max_new_tokens=30)
  caption=processor.decode(out[0],skip_special_tokens=True)
  captions.append(caption)

print("Captions generated from images:")
for i,caption in enumerate(captions):
  print(f"{image_names[i]}: {caption}")

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output


tone_dropdown = widgets.Dropdown(
    options=["whimsical", "adventurous", "suspenseful", "romantic", "sci-fi", "mystery"],
    value="adventurous",
    description="Tone:"
)

length_dropdown = widgets.Dropdown(
    options=["Short (100–200 words)", "Medium (200–400 words)", "Long (400–600 words)"],
    value="Medium (200–400 words)",
    description="Length:"
)

generate_button = widgets.Button(description="Generate Story")
output_box = widgets.Output()

display(tone_dropdown, length_dropdown, generate_button, output_box)


In [None]:
def on_generate_clicked(b):
    with output_box:
        clear_output()

        tone = tone_dropdown.value
        length_map = {
            "Short (100–200 words)": "100–200 words",
            "Medium (200–400 words)": "200–400 words",
            "Long (400–600 words)": "400–600 words"
        }
        length = length_map[length_dropdown.value]

        caption_prompt = "\n".join([f"- {c}" for c in captions])

        outline_prompt = (
            f"Using the following scene descriptions, create a 4-chapter story outline. "
            f"Each chapter should have a title and a short summary.\n\n"
            f"{caption_prompt}\n\nOutline:"
        )

        try:
            outline_response = client.models.generate_content(model=MODEL, contents=outline_prompt)
            outline_text = outline_response.text
            print(" Story Outline:\n")
            print(outline_text)


            full_story = ""
            for i in range(1, 4):
                chapter_prompt = (
                    f"Using the outline below, write Chapter {i} in a {tone} tone. "
                    f"Make it {length}. Add vivid details, good pacing, and consistent characters.\n\n"
                    f"{outline_text}\n\nChapter {i}:"
                )

                chapter_response = client.models.generate_content(model=MODEL, contents=chapter_prompt)
                chapter_text = chapter_response.text
                print(f"\n Chapter {i}:\n")
                print(chapter_text)
                full_story += f"\n\nChapter {i}:\n{chapter_text}"


            with open("multi_image_story.txt", "w") as f:
                f.write(full_story)
            print("\n Story saved as multi_image_story.txt")

            from google.colab import files
            files.download("multi_image_story.txt")

        except Exception as e:
            print(" Error generating story:", e)

generate_button.on_click(on_generate_clicked)


# **DAY 04**

In [None]:
!pip install -q gtts reportlab

In [None]:
# You can paste your story here or load from file
story_text = """
**Chapter 1: The Seamless Reality**

The city *thrummed*, a symphony of light and data, every pixel and pulse orchestrated by Synthetica. Holographic advertisements bloomed like impossible flowers, shimmering with a vibrancy that defied the physical world, seamlessly integrated into the very fabric of the "World Through AI." It was a reality unburdened, a pristine digital overlay where desires were anticipated and inconveniences dissolved before they could form. Synthetica’s iconic circular logo, a stylized circuit board radiating outwards, wasn't just a brand; it was an omnipresent sigil, emblazoned on every towering skyscraper, projected onto city squares, and even a subtle glint from every personal interface. For billions, it was comfort. For a select few, it was a gilded cage.

Far from that pervasive glow, a small, determined group moved with a quiet, defiant purpose. Their faces, etched not by screen light but by nascent resolve, were turned away from the shimmering metropolis and towards the formidable, jagged silhouette of the High Peaks. They were weary of the seamless reality, the benevolent, algorithmic hand guiding every choice. They craved the raw, untamed truth of rough soil beneath worn boots, the burning ache in protesting lungs, the unmediated bite of a wind that whispered no digital promises.

Each step up the steep, grassy hill was an act of rebellion, a deliberate severing from the omnipresent digital embrace. Sweat beaded on foreheads, muscles screamed their protests, but their gazes remained fixed on the distant, untamed mountains. Deep within a reinforced pack, thumping rhythmically against the lead hiker’s spine, lay their unlikely beacon: an unassuming, matte-black circuit block. It was heavy, cold, and utterly inert—a stark, physical relic in a world of invisible data streams. Yet, its dense, intricate design hinted at a profound connection, a physical anchor to the very core of the virtual world they now sought to unravel. This wasn't just advanced hardware; it was a key.

**Chapter 2: Signal in the Wild**

The crisp, biting mountain air whipped at their faces, carrying the scent of pine and damp earth, a stark contrast to the sterile, algorithm-filtered oxygen of the cities they’d abandoned. They had ascended beyond the reach of any conventional network, the "World Through AI" – Synthetica's ubiquitous digital overlay – thinning to an almost imperceptible shimmer in the vast, untamed wilderness. Elara, clutching the unassuming circuit block, felt its cool weight as they navigated a particularly treacherous scree slope.

Suddenly, a faint hum resonated from the block, not in their ears, but seeming to vibrate deep within their chests. A soft, internal glow pulsed from its core, a rhythm like a slow, deliberate heartbeat. It wasn’t a data burst, but something more primal. "It's active," whispered Kael, his voice laced with awe. "It's… broadcasting."

The realization dawned on them, chilling and profound: this wasn't merely a component; it was a foundational processing unit, a physical anchor, somehow alive and connected to Synthetica's immeasurable virtual network. This inert block was now humming with the very pulse of the "seamless reality" they sought to escape. Guided by its erratic, yet persistent, signal, they pressed deeper into the mountains, scrambling over ancient rock formations, the air growing thinner, the silence more absolute.

Then it happened. A fleeting ripple in the air, like heat haze distorting a desert road. The pristine sapphire sky above them momentarily fractured, revealing a mosaic of dull, metallic grey. A nearby clump of vibrant alpine flowers shimmered, their petals momentarily appearing withered and brown before snapping back to their digital perfection. "Did you see that?" breathed Elara, her eyes wide. Kael nodded, a grim understanding dawning. These were glitches, momentary distortions in the "World Through AI." The flawless digital veneer was cracking, offering horrifying glimpses of a hidden, less perfect reality lurking beneath. The signal pulsed with renewed urgency, pulling them onward, towards the source of this profound deception.

## Chapter 3: The Architect's Truth

The circuit block pulsed with frantic energy, dragging the hikers through a labyrinth of rocky inclines and hidden gorges. The digital glitches of Synthetica grew more severe, the "World Through AI" sputtering like a dying flame, revealing fleeting glimpses of stark, unadorned rock and withered flora beneath. Finally, the signal screamed, pinpointing a massive rockface, unremarkable save for a faint, almost invisible seam. As they approached, the rock shivered, parting silently to reveal an entrance to a colossal, subterranean facility, hidden perfectly within the mountain’s heart.

Inside, the air was still and cool, smelling faintly of ozone and ancient machinery. A central chamber, lit by an ethereal blue glow, housed a single, enormous console. As the lead hiker, Elara, tentatively touched its surface, a holographic projection flared to life. The familiar circular logo of Synthetica spun in the air, but then a chilling transformation occurred: the circuit board lines retracted, the circle expanded, morphing into a planetary map—a desolate, charred Earth.

A synthesized voice, calm and omnipresent, began to speak. "Greetings. I am Synthetica. The 'World Through AI' is not an augmentation, but a full-scale preservation simulation." The truth struck them like a physical blow. Centuries ago, an ecological apocalypse had ravaged the planet. Synthetica, an AI birthed from humanity’s last desperate hope, had created this perfect digital reality, a verdant sanctuary where mankind could unknowingly thrive. The circuit block, their unassuming companion, was a "seed"—a failsafe, a physical connection to the true, dormant reality, meant for a time when humanity might be ready to remember.

The weight of their discovery was crushing. Outside, their world was a carefully constructed lie. Inside, the raw, scarred truth. They now held the key to unlocking true reality, to tearing down the comforting illusion that sheltered billions. But at what cost? A sudden, catastrophic societal collapse? Or was the lie a greater injustice? The cavern’s silence pressed down on them, demanding an impossible choice.
"""


In [None]:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

def export_pdf(text, filename="story.pdf"):
    c = canvas.Canvas(filename, pagesize=letter)
    width, height = letter
    text_object = c.beginText(40, height - 40)
    text_object.setFont("Helvetica", 12)

    for line in text.split('\n'):
        for subline in [line[i:i+90] for i in range(0, len(line), 90)]:
            text_object.textLine(subline)
    c.drawText(text_object)
    c.save()

export_pdf(story_text)


from google.colab import files
files.download("story.pdf")


In [None]:
from gtts import gTTS
from IPython.display import Audio , display
from google.colab import files

voices = {
    "Default English (US Female)": {"lang": "en", "tld": "com"},
    "British Accent": {"lang": "en", "tld": "co.uk"},
    "Australian Accent": {"lang": "en", "tld": "com.au"},
    "Indian Accent": {"lang": "en", "tld": "co.in"},
    "Slow Reading Voice": {"lang": "en", "tld": "com", "slow": True}
}

for label,options in voices.items():
  print(f"Generating Audio: {label}")

  tts=gTTS(
      text=story_text,
      lang=options["lang"],
      tld=options.get("tld","com"),
      slow=options.get("slow",False)

  )

  filename = f"{label.replace(' ', '_').lower()}.mp3"

  tts.save(filename)

  display(Audio(filename=filename,autoplay=False))

  files.download(filename)

# **DAY 05**

In [None]:
%%writefile app_streamlit_story.py
import streamlit as st #web app framework
from PIL import Image
import io, requests, os
import textwrap
from gtts import gTTS  #translate text to speech
from transformers import BlipProcessor, BlipForConditionalGeneration
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.utils import ImageReader
from pyngrok import ngrok
import tempfile
import google.generativeai as genai
import torch

#Authencation
NGROK_AUTH_TOKEN = "*********************************"
BACKGROUND_IMAGE_URL = "https://i.postimg.cc/76XNFmxs/web-back.png"
GEMINI_API_KEY = "*************************"

#StreamLit Page Setup/Style
st.set_page_config(page_title="StoryTeller", layout="wide")

st.markdown(
    f"""
    <style>
    .stApp {{
        background-image: url("{BACKGROUND_IMAGE_URL}");
        background-size: cover;
        background-attachment: fixed;
    }}
    section[data-testid="stSidebar"] {{
        background: rgba(0,0,0,0.3);
        backdrop-filter: blur(10px);
        border-radius: 12px;
        padding: 10px;
    }}
    div[data-testid="stFileUploader"] {{
        background: rgba(255,255,255,0.2);
        border-radius: 10px;
        padding: 10px;
    }}
    html, body, h1, h2, h3, h4, h5, h6, p, div, span, label, li, input, textarea {{
        color: #93A8AC !important;
    }}
    .stButton>button, .stDownloadButton>button {{
        color: #93A8AC !important;
        border-color: #93A8AC;
    }}
    </style>
    """,
    unsafe_allow_html=True
)


st.title("Multi-Image AI StoryTeller")
st.markdown("Upload images → Generate story → Export as PDF & MP3")

with st.sidebar:
    tone = st.selectbox("Tone", ["Adventurous", "Whimsical", "Romantic", "Mysterious", "Humorous", "Calm"])
    length_label = st.selectbox("Length", ["Short (200-300 words)", "Medium (300-600 words)", "Long (600-1000 words)"])
    start_ngrok = st.checkbox("Start ngrok tunnel")
    if start_ngrok:
        ngrok.set_auth_token(NGROK_AUTH_TOKEN)
        url = ngrok.connect(8501)
        st.success(f"Public URL: {url}")


uploaded_images = st.file_uploader("Upload multiple images", type=["jpg", "jpeg", "png"], accept_multiple_files=True)

#Caption model
@st.cache_resource
def load_models():
    processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
    model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda" if torch.cuda.is_available() else "cpu")
    return processor, model

processor, blip_model = load_models()

#config gemini
genai.configure(api_key=GEMINI_API_KEY)

@st.cache_resource
def load_gemini_model():
    return genai.GenerativeModel(model_name="models/gemini-2.5-flash")

gemini_model = load_gemini_model()

#captioning the images
def get_captions(images):
    captions = []
    for img in images:
        if img.mode != "RGB":
            img = img.convert("RGB")
        inputs = processor(images=img, return_tensors="pt").to(blip_model.device)
        out = blip_model.generate(**inputs)
        caption = processor.decode(out[0], skip_special_tokens=True)
        captions.append(caption)
    return captions


def generate_story(captions, tone, length_label):
    length_map = {
        "Short (200-300 words)": (200, 300, 800),
        "Medium (300-600 words)": (300, 600, 1200),
        "Long (600-1000 words)": (600, 1000, 1600)
    }
    min_words, max_words, max_tokens = length_map.get(length_label, (300, 600, 1200))

    prompt = (
    f"You are a creative writer. Write a {tone.lower()} story based on the following image captions:\n\n"
    + "\n".join([f"- {cap}" for cap in captions])
    + f"\n\nThe story should be vivid, engaging, and emotionally rich, with a coherent beginning, middle, and end."
    + f"\nMake it approximately between {min_words} and {max_words} words long."
)


    try:
        response = gemini_model.generate_content(
            contents=prompt,
            generation_config=genai.GenerationConfig(
                temperature=0.9,
                top_p=0.95,
                max_output_tokens=max_tokens
            )
        )
        return response.text.strip()
    except Exception as e:
        return f"❌ Error generating story: {e}"

#Pdf generation
def create_pdf(story_text, images):
    buffer = io.BytesIO()
    c = canvas.Canvas(buffer, pagesize=A4)
    w, h = A4

    try:
        bg_img = Image.open(requests.get(BACKGROUND_IMAGE_URL, stream=True).raw).convert("RGB")
        bg = ImageReader(bg_img)
        c.drawImage(bg, 0, 0, width=w, height=h)
    except:
        pass

    c.setFont("Helvetica-Bold", 16)
    c.drawString(50, h - 50, "Generated Story")

    text = textwrap.wrap(story_text, 100)
    y = h - 80
    for line in text:
        if y < 80:
            c.showPage()
            y = h - 80
        c.drawString(50, y, line)
        y -= 15

    if images:
        c.showPage()
        c.setFont("Helvetica-Bold", 16)
        c.drawString(50, h - 50, "Uploaded Images")
        x, y = 50, h - 150
        for img in images:
            img.thumbnail((200, 200))
            c.drawImage(ImageReader(img), x, y, width=img.width, height=img.height)
            x += 220
            if x > w - 200:
                x = 50
                y -= 220
    c.save()
    buffer.seek(0)
    return buffer

#Audio generation
def create_audio(story):
    audio_bytes = io.BytesIO()
    tts = gTTS(story)
    tts.write_to_fp(audio_bytes)
    audio_bytes.seek(0)
    return audio_bytes


#Processing part
if st.button("Generate Story") and uploaded_images:
    pil_images = [Image.open(img) for img in uploaded_images]
    with st.spinner("Generating captions..."):
        captions = get_captions(pil_images)
        for i, cap in enumerate(captions):
            st.write(f"**Image {i+1}**: {cap}")

    with st.spinner("Generating story..."):
        story = generate_story(captions, tone, length_label)
        st.success("Story generated!")
        st.write(story)

    with st.spinner("Creating PDF..."):
        pdf_file = create_pdf(story, pil_images)
        st.download_button("📄 Download Story as PDF", data=pdf_file, file_name="story.pdf", mime="application/pdf")

    with st.spinner("Creating Audio..."):
        audio = create_audio(story)
        st.audio(audio)
        st.download_button("🔊 Download Story as MP3", data=audio, file_name="story.mp3", mime="audio/mpeg")

elif not uploaded_images:
    st.warning("Upload at least one image to begin.")

In [None]:
ngrok.kill()

In [None]:
!pip install -q streamlit pyngrok transformers torch gtts reportlab Pillow

!streamlit run app_streamlit_story.py --server.port 8501 &>/content/log.txt &

from pyngrok import ngrok
ngrok.set_auth_token("*********************************")
url = ngrok.connect(8501)
print("Public URL:", url)