role: AI / ML Engineer @ HypeOn AI
focus: Production LLM systems for D2C trend prediction
shipping: BigQuery NL2SQL MCP Server - open-source eval infra
building: GPT-2 124M from scratch - Triton attention kernels
tiny diffusion - DPO post-training stack
philosophy: Tradeoffs over tools - evals before scale - ship narrow, then expandI build the messy middle of applied AI: multi-stage orchestration, retrieval that actually retrieves the right thing, NL-to-SQL with cost guardrails, and the observability that keeps it running in production.
Strong Python (FastAPI), end-to-end ownership across GCP and AWS, and a bias toward systems that survive contact with real users. Currently working through a from-scratch ML stack (transformer, fused GPU kernels, diffusion, post-training) to close the gap from "builds with LLMs" to "builds the LLMs."
| Languages | |
| LLM & AI | |
| Backend | |
| Data & ML | |
| Cloud & Ops |
mcp-bigquery-evals · the calling-card project
Open-source MCP server that lets agents query BigQuery in natural language with schema-aware grounding, cost guardrails, and a built-in eval harness so the behavior is measurable, not vibes-based. Sits at the intersection of three 2026 hot topics: MCP, evals, and NL-to-SQL.
Closing the gap from applied LLM engineer to ML / Research Engineer by rebuilding the modern stack from first principles. Each repo ships with the math, ablations, weights on Hugging Face, and a writeup.
|
GPT-2 124M reproduction in clean PyTorch. Modern parts swapped in. Cost receipts in dollars and H100 hours, not vibes. |
Hand-written fused kernels for the transformer hot path
(attention, RMSNorm, SwiGLU, RoPE), benchmarked against |
|
Diffusion built from the forward process up. Math derived in the README, samples on CIFAR-10 and CelebA, FID against literature. |
Post-training stack: SFT on demonstrations, DPO on preferences, LLM-judge eval with win-rate and Wilson confidence intervals. |
|
Multi-stage routing (chitchat / factual / research) with SSE streaming, session memory, idempotent retries, Pydantic-validated outputs, prompt-injection guardrails, and Prometheus metrics. |
Schema discovery, synonym matching, cost safety caps. Multi-provider routing with primary plus fallback. Built for non-technical operators to query the warehouse without writing SQL. |
|
LLM-based invoice extraction, real-time stock alerts, demand forecasting, and a visualization dashboard for business insight. Shipped for a retail client during freelance work. |
RAG over 500+ clinical PDFs with chunking, metadata filtering, and guardrails to reduce unsupported answers. Internal tool at Synclovis Systems. |
| When | Role | Where |
|---|---|---|
2025.10 → now |
AI / ML Engineer | HypeOn AI · D2C trend prediction |
2024.10 → 2025.09 |
Freelance ML / AI Engineer | Independent |
2024.06 → 2024.09 |
Backend Developer Intern | Synclovis Systems |
2020 → 2024 |
B.Tech, Computer Science | K.S.R.M College / JNTU Anantapur · CGPA 8.14 |
| Tradeoffs over tools | Pick by constraint, not hype. Postgres + pgvector beats a managed vector DB until it doesn't. |
| Evals before scale | If you cannot measure it, you cannot improve it. A bad eval beats no eval. |
| Data quality over model swapping | A new model rarely fixes bad inputs. Retrieval and prompt structure compound. |
| Infrastructure is the product | Latency, cost, reliability are features users feel. The model is one component. |
| Ship narrow, then expand | One user, one workflow, working end-to-end. Tiny systems that ship beat grand systems that demo. |
| From-scratch when it teaches | Reach for the abstraction once, then go a layer deeper. The best engineers can drop a layer. |
Open to collaboration on production LLM systems, RAG pipelines, evals, ML systems / GPU performance, and applied AI infrastructure.
umarfarook-ai.vercel.app · LinkedIn · umarfarook0yt@gmail.com
built quietly · shipping noisily


