# Zenith Tutorial

Tutorial lengkap cara menggunakan Zenith ML Optimization Framework.

**Chapters:**
1. Getting Started
2. Basics
3. Quantization
4. QAT Training
5. PyTorch Integration
6. Triton Deployment
7. Autotuner

---
## Chapter 1: Getting Started

In [None]:
# 1.1 Install Zenith
!rm -rf ZENITH 2>/dev/null
!git clone https://github.com/vibeswithkk/ZENITH.git
%cd ZENITH
!pip install -q -e .

In [None]:
# 1.2 Verify Installation
import zenith
print(f"Zenith Version: {zenith.__version__}")

In [None]:
# 1.3 Check Backends
from zenith import backends

print(f"CPU Available: {backends.is_cpu_available()}")
print(f"CUDA Available: {backends.is_cuda_available()}")

---
## Chapter 2: Basics

In [None]:
# 2.1 Import Modules
import zenith
from zenith import backends
from zenith.optimization.qat import FakeQuantize, QATConfig
from zenith.serving.triton_client import MockTritonClient

print("All imports successful!")

In [None]:
# 2.2 Get Available Backends
print(backends.get_available_backends())

---
## Chapter 3: Quantization

In [None]:
# 3.1 Create FakeQuantize
from zenith.optimization.qat import FakeQuantize

fq = FakeQuantize(num_bits=8, symmetric=True)
print(f"Created FakeQuantize: {fq.num_bits}-bit")

In [None]:
# 3.2 Observe Data
import numpy as np

data = np.random.randn(100).astype(np.float32)
fq.observe(data)
print("Data observed for calibration")

In [None]:
# 3.3 Apply Quantization
quantized = fq.forward(data)

error = np.mean(np.abs(data - quantized))
print(f"Mean Quantization Error: {error:.6f}")

In [None]:
# 3.4 Get Quantization Parameters
params = fq.get_quantization_params()
print(f"Scale: {params.scale}")
print(f"Zero Point: {params.zero_point}")

---
## Chapter 4: QAT Training

In [None]:
# 4.1 QAT Config
from zenith.optimization.qat import QATConfig

config = QATConfig(
    weight_bits=8,
    activation_bits=8,
    symmetric_weights=True,
    per_channel_weights=True
)
print(f"Weight bits: {config.weight_bits}")
print(f"Activation bits: {config.activation_bits}")

In [None]:
# 4.2 Prepare Model for QAT
from zenith.optimization.qat import prepare_model_for_qat

layer_names = ['fc1', 'fc2', 'fc3']
trainer = prepare_model_for_qat(layer_names, config)
print(f"QAT Trainer created with {len(trainer.modules)} modules")

In [None]:
# 4.3 BatchNorm Folding
from zenith.optimization.qat import fold_bn_into_conv

weight = np.random.randn(4, 3, 3, 3).astype(np.float32)
bias = np.random.randn(4).astype(np.float32)
bn_mean = np.random.randn(4).astype(np.float32)
bn_var = np.abs(np.random.randn(4)) + 0.1
bn_gamma = np.random.randn(4).astype(np.float32)
bn_beta = np.random.randn(4).astype(np.float32)

folded_w, folded_b = fold_bn_into_conv(weight, bias, bn_mean, bn_var, bn_gamma, bn_beta)
print(f"Folded weight shape: {folded_w.shape}")

---
## Chapter 5: PyTorch Integration

In [None]:
# 5.1 Create PyTorch Model
import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.fc2(torch.relu(self.fc1(x)))

model = SimpleNet()
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# 5.2 Extract Weights
layer_weights = {
    'fc1': model.fc1.weight.detach().numpy(),
    'fc2': model.fc2.weight.detach().numpy(),
}

fp32_size = sum(w.nbytes for w in layer_weights.values())
print(f"FP32 Size: {fp32_size / 1024:.2f} KB")
print(f"INT8 Size: {fp32_size / 4 / 1024:.2f} KB")
print(f"Reduction: 4.0x")

---
## Chapter 6: Triton Deployment

In [None]:
# 6.1 Create Mock Client
from zenith.serving.triton_client import MockTritonClient, ModelMetadata

client = MockTritonClient("localhost:8000")
print(f"Server Live: {client.is_server_live()}")
print(f"Server Ready: {client.is_server_ready()}")

In [None]:
# 6.2 Register Model
client.register_model(
    "my_model",
    metadata=ModelMetadata(name="my_model", platform="python")
)
print(f"Model Ready: {client.is_model_ready('my_model')}")

In [None]:
# 6.3 Run Inference
from zenith.serving.triton_client import InferenceInput

data = np.array([1.0, 2.0, 3.0]).astype(np.float32)
inputs = [InferenceInput(name="input", data=data)]

result = client.infer("my_model", inputs)
print(f"Success: {result.success}")
print(f"Latency: {result.latency_ms:.3f} ms")

---
## Chapter 7: Autotuner

In [None]:
# 7.1 Define Search Space
from zenith.optimization.autotuner import SearchSpace

space = SearchSpace("matmul_space")
space.define("block_size", [16, 32, 64])
space.define("num_warps", [2, 4])
print(f"Search space size: {space.size()}")

In [None]:
# 7.2 Benchmark Function
def benchmark(config):
    return 1000 / (config["block_size"] * config["num_warps"])

print(f"Example: benchmark(block_size=32, num_warps=4) = {benchmark({'block_size': 32, 'num_warps': 4}):.2f}")

In [None]:
# 7.3 Run Tuning
from zenith.optimization.autotuner import KernelAutotuner, TuningConfig, GridSearch

autotuner = KernelAutotuner(strategy=GridSearch())
config = TuningConfig(op_name="matmul", input_shapes=[(512, 512)])

best_params, best_time = autotuner.tune(config, space, benchmark, max_trials=6)
print(f"Best Config: {best_params}")
print(f"Best Time: {best_time:.3f} ms")

---
## Summary

Anda telah mempelajari:

| Chapter | Topic | Status |
|---------|-------|--------|
| 1 | Getting Started | Completed |
| 2 | Basics | Completed |
| 3 | Quantization | Completed |
| 4 | QAT Training | Completed |
| 5 | PyTorch | Completed |
| 6 | Triton | Completed |
| 7 | Autotuner | Completed |

**Selamat! Tutorial selesai!**