<a href="https://colab.research.google.com/github/urielmun/capstone-lab/blob/main/model_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import re
import os
from tqdm import tqdm
from datasets import load_dataset
from itertools import islice
import json

In [11]:
INPUT_FILE = "input.json"
OUTPUT_FILE = "output.json"

In [2]:
from transformers import pipeline

# Load a chat-style text generation model
generator = pipeline(
    "text-generation",
    model="mistralai/Mistral-7B-Instruct-v0.2",
    device_map="auto"
)
if generator.tokenizer.pad_token_id is None:
    generator.tokenizer.pad_token_id = generator.model.config.eos_token_id

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Device set to use cuda:0


In [3]:
dialogue_tracking_template = """
### role ###
You are a literary dialogue analyzer.

### instruction ###
1. Extract all lines of dialogue enclosed in quotation marks.
2. Identify the speaker of each line of dialogue.
3. Segment and summarize the narration that appears between dialogues, sentence by sentence.
4. If the speaker’s name appears nearby (e.g., “said Alice”), use that.
5. If the speaker’s name is not written, infer it from context (gender, previous line, actions, etc.).
6. Keep narration and dialogue separate.

### Handling Ambiguities ###
- If the speaker cannot be identified, label as `"Unknown character"`.
- If multiple narrative sentences appear in a row, combine them into one `"Narrative"` entry.
- If there are no dialogues, output only `"Narrative"`.
- If the story includes a child or parent, label them explicitly as `"Child"`, `"Father"`, `"Mother"`, etc., based on the text.
- Keep capitalization consistent with input text.

### examples ###
Example 1:
Text:
She smiled. “It’s a beautiful morning.”
He nodded. “Let’s go for a walk.”
Output:
{{"Narrative": ["She smiled"], "Female character": ["It’s a beautiful morning."], "Narrative": ["He nodded"], "Male character": ["Let’s go for a walk."]}}

Example 2:
Text:
The child giggled. "Can we do it again?"
His father laughed softly. "Not this time, son."
Output:
{{"Narrative":["The child giggled"], "Child": ["Can we do it again?"], "Narrative": ["His Father laughed softly."], "Father": ["Not this time, son."]}}

Now analyze this paragraph: {paragraph}

Output:
"""

In [7]:
script_generate_template="""
### ROLE ###
You are a literary dialogue analyzer and cinematic scene generator.

### TASK PART 1 — Dialogue & Narration Extraction ###
Given the following paragraph:

{paragraph}

Follow these rules:

1. Extract **all dialogue lines** enclosed in quotation marks.
2. Identify the **speaker** of each dialogue.
3. Segment and summarize the **narration** appearing between dialogues, sentence by sentence.
4. If a speaker’s name appears near the line (e.g., “said Alice”), use that name.
5. If a speaker is not explicitly given, infer it from context, actions, gender, or previous turns.
6. Keep **dialogue and narration strictly separated**.

### Handling Ambiguities ###
- If the speaker cannot be identified, label as `"Unknown character"`.
- Combine consecutive narration sentences into one `"Narrative"` entry.
- If the text has no dialogue, output only `"Narrative"`.
- If the text refers to a child or parent, label them explicitly as `"Child"`, `"Father"`, `"Mother"`.
- Preserve capitalization and wording from the original paragraph.

### OUTPUT FORMAT (Part 1) ###
Return the structured extraction as:
{
  "Narrative": [...],
  "SpeakerName or Role": ["dialogue line"],
  ...
}

This becomes the input for Part 2.

---

### TASK PART 2 — Convert Extracted Text to Cinematic Video Scenes ###
Now take the extracted narration/dialogue and convert it into a **scene-based video script**, short enough to fit a **max 3-minute video**.

### RULES FOR VIDEO SCRIPTING ###
- Break the story into sequential **scenes** (“scene1”, “scene2”, …).
- Each scene must include:
  - `"LOCATION"`: place, time of day, mood.
  - `"AUDIO"`: ambient sound, background music.
  - `"CHARACTERS"`: each character’s name, appearance, traits.
  - `"DIALOGUE"`: lines spoken by each character in that scene.
  - `"VIDEO"`: visual description of actions, expressions, camera motion.

- Be specific and cinematic:
  - Describe atmosphere, lighting, camera angle, and gestures.
  - Convert narration into meaningful visual actions.
  - Dialogue should be placed naturally within the scene.

### OUTPUT FORMAT (Part 2 — JSONL Compatible) ###
Format it so each scene could be stored as a single JSONL object:

{
 "scene1": {
    "LOCATION": "Describe setting, mood, lighting",
    "AUDIO": "Describe ambient sounds or music",
    "CHARACTERS": {
        "CharacterA": "appearance, clothing, traits",
        "CharacterB": "appearance, clothing, traits"
    },
    "DIALOGUE": {
        "CharacterA": "line of dialogue",
        "CharacterB": "line of dialogue"
    },
    "VIDEO": "Describe actions, camera direction, gestures"
 },
 "scene2": {
    ...
 }
}

Return all scenes as a single JSON-like block (not arrays) so it can be exported to JSONL easily.

### FINAL OUTPUT ###
Return ONLY the completed scene script in the above format.


"""

In [13]:
def with_model(text_to_process, instruction=dialogue_tracking_template, max_length=5000):
  final_instruction = instruction.replace("{paragraph}", text_to_process)

  messages = [{"role": "user", "content": f"{instruction}\n\n[원본 텍스트]: {text_to_process}"}]
  prompt = generator.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
  try:
      # 텍스트 생성 파이프라인 실행
      result = generator(
          prompt,
          max_new_tokens=3000,
          max_length=max_length,
          num_return_sequences=1,
          do_sample=True, # 샘플링 기반 생성 활성화
          temperature=0.7, # 창의성을 위한 온도 설정
          top_p=0.9, # 상위 p 확률 내에서 선택
          eos_token_id=generator.tokenizer.eos_token_id
      )

      # 모델 출력에서 실제 생성된 텍스트만 추출
      processed_text = result[0]['generated_text']
      response_start = processed_text.rfind("[/INST]")
      if response_start != -1:
          processed_text = processed_text[response_start + len("[/INST]"):].strip()
      return processed_text
  except Exception as e:
      print(f"⚠️ 텍스트 생성 중 오류 발생: {e}")
      return "" # 오류 발생 시 빈 문자열 반환 (또는 다른 적절한 처리)

In [9]:
test = """It took me a long time to understand where he came from. The little prince,
who asked me so many questions, never seemed to hear the ones I asked him.
It was things he said quite at random that, bit by bit, explained everything.
For instance, when he first caught sight of my airplane (I won't draw my
airplane; that would be much too complicated for me) he asked:
"What's that thing over there?"
"It's not a thing. It flies. It's an airplane. My airplane."
And I was proud to tell him I could fly. Then he exclaimed:
"What! You fell out of the sky?"
"Yes," I said modestly.
"Oh! That's funny..." And the little prince broke into a lovely peal of
laughter, which annoyed me a good deal. I like my misfortunes to be taken
seriously. Then he added, "So you fell out of the sky, too. What planet are you
from?"
That was when I had the first clue to the mystery of his presence, and I
questioned him sharply.
"Do you come from another planet?"
But he made no answer. He shook his head a little, still staring at my
airplane.
"Of course, that couldn't have brought you from very far..."
And he fell into a reverie that lasted a long while. Then, taking my sheep
out of his pocket, he plunged into contemplation of his treasure.
You can imagine how intrigued I was by this hint about "other planets." I
tried to learn more: "Where do you come from, little fellow? Where is this
'where I live' of yours? Where will you be taking my sheep?"
After a thoughtful silence he answered, "The good thing about the crate
you've given me is that he can use it for a house after dark."
"Of course. And if you're good, I'll give you a rope to tie him up during the
day. And a stake to tie him to."
This proposition seemed to shock the little prince. "Tie him up? What a
funny idea!"
"But if you don't tie him up, he'll wander off somewhere and get lost."
My friend burst out laughing again.
"Where could he go?"
"Anywhere. Straight ahead..."
Then the little prince remarked quite seriously, "Even if he did,
everything's so small where I live!" And he added, perhaps a little sadly,
"Straight ahead, you can't go very far."""

In [12]:
preprocessed_text = with_model(test)
print(preprocessed_text)
video_script=with_model(preprocessed_text,script_generate_template)
print(video_script)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=3000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=3000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


{"Narrative": ["It took me a long time to understand where he came from.", "The little prince, who asked me so many questions, never seemed to hear the ones I asked him.", "It was things he said quite at random that, bit by bit, explained everything.", "For instance, when he first caught sight of my airplane (I won't draw my airplane; that would be much too complicated for me) he asked: 'What's that thing over there?'", "And I was proud to tell him I could fly. Then he exclaimed: 'What! You fell out of the sky?'", "Yes," I said modestly.", "Then he added, 'So you fell out of the sky, too. What planet are you from?'", "That was when I had the first clue to the mystery of his presence, and I questioned him sharply.", "Do you come from another planet?", "But he made no answer. He shook his head a little, still staring at my airplane.", "Of course, that couldn't have brought you from very far...", "And he fell into a reverie that lasted a long while.", "Then, taking my sheep out of his poc

In [None]:
if not os.path.exists(INPUT_FILE):
    with open(INPUT_FILE, 'w', encoding='utf-8') as f:
        json.dump(sample_data, f, ensure_ascii=False, indent=4)
    print(f"➡️ '{INPUT_FILE}' 예시 파일 생성 완료. 파일을 확인하고 실제 데이터를 넣으세요.")
try:
    with open(INPUT_FILE, 'r', encoding='utf-8') as f:
        data = json.load(f)
    print(f"\n📁 '{INPUT_FILE}' 파일 로딩 완료. 처리할 항목 수: {len(data)}")

except FileNotFoundError:
    print(f"❌ 오류: '{INPUT_FILE}' 파일을 찾을 수 없습니다. 파일 경로와 이름을 확인해 주세요.")
    exit()

In [18]:
with open(INPUT_FILE, 'r', encoding='utf-8') as infile,open(OUTPUT_FILE, "w", encoding="utf-8") as outfile:

  data=json.load(infile)
  print("**file load**")

  context=data["context"]
  print("**context load**")

  dialogue_track=with_model(context)
  print("**dialogue track**",dialogue_track)

  video_script=[with_model(dialogue_track,script_generate_template)]
  print("**video script**",video_script)

  record={
      "book_title":data["book_title"],
      "author":data["author"],
      "script":video_script
  }
  outfile.write(json.dumps(record, ensure_ascii=False) + "\n")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=3000) and `max_length`(=5000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


**file load**
**context load**


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=3000) and `max_length`(=5000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


**dialogue track** {"Narrative": ["It took me a long time to understand where he came from.", "The little prince, who asked me so many questions, never seemed to hear the ones I asked him.", "It was things he said quite at random that, bit by bit, explained everything.", "For instance, when he first caught sight of my airplane (I won't draw my airplane; that would be much too complicated for me) he asked: 'What's that thing over there?'.", "And I was proud to tell him I could fly. Then he exclaimed: 'What! You fell out of the sky?'.", "Yes," I said modestly.", "Then he added, 'So you fell out of the sky, too. What planet are you from?'.", "That was when I had the first clue to the mystery of his presence, and I questioned him sharply.", "Do you come from another planet?", "But he made no answer.", "He shook his head a little, still staring at my airplane.", "Of course, that couldn't have brought you from very far...", "And he fell into a reverie that lasted a long while.", "Then, takin