In [7]:
# Pretraining script template

In [11]:
import os, sys
import torch
from IPython.display import Markdown
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, TrainerCallback

In [9]:
model_id = '/home/LLM_models/gpt-oss-20b'
#model_id = '/home/LLM_models/Llama-3.2-3B-Instruct'

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda:0", 
    torch_dtype=torch.bfloat16,
    use_cache=False,
)

MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
model

GptOssForCausalLM(
  (model): GptOssModel(
    (embed_tokens): Embedding(201088, 2880, padding_idx=199999)
    (layers): ModuleList(
      (0-23): 24 x GptOssDecoderLayer(
        (self_attn): GptOssAttention(
          (q_proj): Linear(in_features=2880, out_features=4096, bias=True)
          (k_proj): Linear(in_features=2880, out_features=512, bias=True)
          (v_proj): Linear(in_features=2880, out_features=512, bias=True)
          (o_proj): Linear(in_features=4096, out_features=2880, bias=True)
        )
        (mlp): GptOssMLP(
          (router): GptOssTopKRouter()
          (experts): GptOssExperts()
        )
        (input_layernorm): GptOssRMSNorm((2880,), eps=1e-05)
        (post_attention_layernorm): GptOssRMSNorm((2880,), eps=1e-05)
      )
    )
    (norm): GptOssRMSNorm((2880,), eps=1e-05)
    (rotary_emb): GptOssRotaryEmbedding()
  )
  (lm_head): Linear(in_features=2880, out_features=201088, bias=False)
)

In [6]:
messages = [{"role": "user", "content": "Explain quantum mechanics clearly and concisely."}]

In [7]:
formatted_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(formatted_chat)

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-19

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Explain quantum mechanics clearly and concisely.<|end|><|start|>assistant


In [13]:
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids, max_new_tokens=65536, pad_token_id=tokenizer.eos_token_id)

txts_raw = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(txts_raw)
txts = tokenizer.decode(outputs[0], skip_special_tokens=True)
display(Markdown(txts))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-19

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Explain quantum mechanics clearly and concisely.<|end|><|start|>commentary<|message|>We need to explain quantum mechanics clearly and concisely, likely for a general audience but thorough enough. Provide core concepts: wave-particle duality, superposition, uncertainty principle, quantization, discrete energy levels, wavefunction, Schrödinger eq, probabilistic interpretation, measurement collapse, entanglement, nonlocality. Keep concise but clear. Use analogies, maybe example: electrons in hydrogen atom. Use simple math where helpful. Use concise explanation of key equations. Also explain why intuitive classical view fails. Provide historical context. Provide "big picture". Should keep within short/medium length bu

systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-19

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.userExplain quantum mechanics clearly and concisely.commentaryWe need to explain quantum mechanics clearly and concisely, likely for a general audience but thorough enough. Provide core concepts: wave-particle duality, superposition, uncertainty principle, quantization, discrete energy levels, wavefunction, Schrödinger eq, probabilistic interpretation, measurement collapse, entanglement, nonlocality. Keep concise but clear. Use analogies, maybe example: electrons in hydrogen atom. Use simple math where helpful. Use concise explanation of key equations. Also explain why intuitive classical view fails. Provide historical context. Provide "big picture". Should keep within short/medium length but comprehensive. Let's produce final.final# Quantum Mechanics in a Nutshell  

Quantum mechanics (QM) is the theory that describes how nature behaves at the scale of atoms and sub‑atomic particles.  
It replaces the deterministic picture of classical physics with a probabilistic, wave‑like description that works for everything from electrons to photons to molecules.

---

## 1.  What is a Quantum System?

- **Particle + Wave (wave‑particle duality)**  
  A microscopic object (electron, photon, etc.) behaves like a particle (it has mass, charge, momentum) **and** like a wave (it can interfere and diffract).  
  The “wave” is the **wavefunction** \( \psi(\mathbf{r},t) \).

- **State vector**  
  The wavefunction contains all information about a system.  
  In modern language it is a vector in a high‑dimensional complex space (Hilbert space).

---

## 2.  The Math (in lay terms)

### Schrödinger Equation

The time‑dependent equation governs how the wavefunction changes:

\[
i\hbar \frac{\partial}{\partial t}\psi(\mathbf{r},t)
=
\hat H \psi(\mathbf{r},t)
\]

- \(i = \sqrt{-1}\)  
- \(\hbar\) (reduced Planck constant) ≈ \(1.05\times10^{-34}\) J·s  
- \(\hat H\) is the Hamiltonian (energy operator): for a single particle  
  \[
  \hat H = \frac{\hat p^2}{2m} + V(\mathbf{r}) ,
  \]
  where \(\hat p = -i\hbar\nabla\).

**Key point:** The equation is linear; linearity gives rise to superposition.

---

## 3.  Core Principles

| Principle | What it says | Everyday analogy |
|-----------|--------------|------------------|
| **Superposition** | A system can be *simultaneously* in multiple states. If you prepare an electron “up” and “down” at the same time, its wavefunction is a weighted sum of both. | A spinning coin is both heads + tails until you look. |
| **Probability Amplitude** | Square of the wavefunction gives probability density: \(P(\mathbf{r}) = |\psi|^2\). | The pattern of light in a double‑slit experiment: bright fringes where many photons arrive, dark where few do. |
| **Collapse upon Measurement** | When you measure a property (e.g., position or spin), the wavefunction instantaneously “collapses” to an eigenstate associated with the outcome. | Checking a coin’s side settles the superposition to either heads or tails. |
| **Quantization** | Some observables take only discrete values (energies, angular momentum). | In a hydrogen atom, energy levels are \(-13.6\ \text{eV}/n^2\) (with \(n=1,2,3,\dots\)). |
| **Uncertainty Principle** | You cannot simultaneously know position and momentum with arbitrary precision:  
\[
\Delta x \, \Delta p \;\ge\; \frac{\hbar}{2}.
\] | Trying to pinpoint a raindrop’s exact path while also measuring its speed gives you an error trade‑off. |
| **Entanglement** | Two or more particles become linked so that the state of one instantly determines the state of the other, no matter the distance. | Two coins that always land opposite, no matter how far apart you carry them. |

---

## 4.  Illustrative Example: The Hydrogen Atom

1. **Potential** \(V(r) = -\frac{e^2}{4\pi\varepsilon_0 r}\).  
2. Solve the radial Schrödinger equation → gives discrete energies  
   \[
   E_n = -\frac{13.6\ \text{eV}}{n^2}, \quad n=1,2,3,\dots
   \]
3. **Orbitals** (spatial wavefunctions) \( \psi_{nlm}(\mathbf{r}) \) have shapes (s, p, d… shells) and nodal surfaces where the electron probability is zero.  
4. **Observable measurement**: If you ionize the atom, you only detect the electron at one point, not at the entire cloud described by \( \psi \).  

Hence QM predicts a *cloud of probability* while classical physics would force the electron into a single path.

---

## 5.  Why It Matters

| Application | QM is essential |
|-------------|----------------|
| Nuclear magnetic resonance (MRI) | Spin states of nuclei |
| Semiconductors & transistors | Quantum tunneling and band structure |
| Lasers & LEDs | Stimulated emission |
| Quantum cryptography & computing | Entanglement & superposition |
| Chemical bonding & materials science | Electron wavefunctions |

---

## 6.  Common Misconceptions

1. **“Particle follows a strange trajectory.”**  
   – *Reality*: We cannot track a definite trajectory; the wavefunction tells probabilities.

2. **“Quantum mechanics is wrong or incomplete.”**  
   – *Reality*: It predicts experiments with extraordinary accuracy (e.g., the Lamb shift, Casimir effect). In certain regimes (very large, strongly interacting, or non‑relativistic) classical physics suffices.

3. **“Observers create reality.”**  
   – *Reality*: Measurement correlates the quantum system with the measuring device; no “mystery” beyond established physics.

---

## 7.  Take‑away

- **Quantum mechanics is the framework that replaces classical deterministic pictures with a statistical, wave‑based description.**  
- Its **key insights**—superposition, uncertainty, quantization, and entanglement—solve puzzles that classical physics could not (e.g., atomic stability, spectral lines, double‑slit interference).  
- The **Schrödinger equation** is the single, beautiful equation that drives all quantum phenomena.  

From the hydrogen atom’s discrete spectrum to the possibility of quantum computers, the principles above hold the universe together on the tiniest scales.