### Ollama Prompt

In [2]:
import requests

url = "http://localhost:11434/api/generate"
payload = {
    "model": "deepseek-r1:8b",  # match the model you pulled
    "prompt": "Write a Python function to calculate GARCH volatility.",
    "stream": False
}

response = requests.post(url, json=payload)

print(response.json()["response"])


<think>
Okay, I need to write a Python function to calculate GARCH volatility. Hmm, where do I start? I remember that GARCH stands for Generalized Autoregressive Conditional Heteroskedasticity. It's used to model the volatility of financial returns. So, the function should take some time series data as input and output the volatility measures.

First, I think I need to import some necessary libraries. Pandas is probably needed for handling the data, and maybe matplotlib for plotting, but I'm not sure if that's necessary right now. NumPy might be useful for numerical operations too.

Wait, what are the inputs? The user didn't specify, but perhaps the function should accept a pandas DataFrame with closing prices or log returns. Let me assume it's a Series of log returns since GARCH models are often applied to returns rather than prices. I'll need to check if the data is a Series and has at least 22 observations because GARCH requires a lag of at least one period.

Next, parameters. The f

### Using LoRA

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model

# Load model with memory optimizations
model_id = "deepseek-ai/deepseek-llm-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Configure quantization with CPU offloading enabled
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
    llm_int8_enable_fp32_cpu_offload=True  # Enable CPU offloading
)

# Load the model with quantization config
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto",
    torch_dtype=torch.float16
)

# Apply LoRA config
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA adapters
model = get_peft_model(model, lora_config)

Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00,  3.97s/it]
Some parameters are on the meta device because they were offloaded to the cpu.


In [4]:
from datasets import Dataset

# Create a small dummy dataset
data = {
    "quote": [
        "The sky is not the limit; it’s just the beginning.",
        "Data is the new oil, but it needs refining.",
        "AI is not replacing humans, it’s augmenting them.",
        "Mistakes are proof that you are trying.",
        "Stay curious, stay learning, stay relevant."
    ]
}

dataset = Dataset.from_dict(data)
def tokenize(example):
    return tokenizer(example["quote"], padding="max_length", truncation=True, max_length=128)
tokenized = dataset.map(tokenize)

Map: 100%|██████████| 5/5 [00:00<00:00, 307.61 examples/s]


Configure training with memory-efficient settings:

In [9]:
from transformers import TrainingArguments
from trl import SFTTrainer

# Memory recording for debugging
torch.cuda.memory._record_memory_history()

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Very memory efficient training arguments
args = TrainingArguments(
    output_dir="./lora-deepseek",
    per_device_train_batch_size=1,  # Minimize batch size
    gradient_accumulation_steps=4,  # Accumulate instead
    num_train_epochs=3,
    logging_steps=10,
    save_steps=500,
    fp16=True,
    optim="adamw_torch",  # Use standard optimizer
    max_grad_norm=0.3,    # Limit gradient size
    warmup_ratio=0.03,    # Warm up learning rate
    lr_scheduler_type="cosine",
    save_total_limit=3,   # Limit checkpoints
)

# Use SFTTrainer with additional memory optimizations
trainer = SFTTrainer(
    model=model,
    train_dataset=tokenized,
    args=args,
    peft_config=lora_config,  # Pass LoRA config here as well
           # Limit sequence length if possible
)

trainer.train()

# Dump memory snapshot for analysis
torch.cuda.memory._dump_snapshot("my_snapshot.pickle")

Truncating train dataset: 100%|██████████| 5/5 [00:00<00:00, 3335.17 examples/s]
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


ValueError: You can't train a model that has been loaded in 8-bit or 4-bit precision with CPU or disk offload. If you want train the 8-bit or 4-bit model in CPU, please install bitsandbytes with multi-backend, see https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend