Umarfarook Umarfarook1

role:        AI / ML Engineer @ HypeOn AI
focus:       Production LLM systems for D2C trend prediction
shipping:    BigQuery NL2SQL MCP Server  -  open-source eval infra
building:    GPT-2 124M from scratch  -  Triton attention kernels
             tiny diffusion  -  DPO post-training stack
philosophy:  Tradeoffs over tools  -  evals before scale  -  ship narrow, then expand

About

I build the messy middle of applied AI: multi-stage orchestration, retrieval that actually retrieves the right thing, NL-to-SQL with cost guardrails, and the observability that keeps it running in production.

Strong Python (FastAPI), end-to-end ownership across GCP and AWS, and a bias toward systems that survive contact with real users. Currently working through a from-scratch ML stack (transformer, fused GPU kernels, diffusion, post-training) to close the gap from "builds with LLMs" to "builds the LLMs."

Stack

Languages
LLM & AI
Backend
Data & ML
Cloud & Ops

Currently shipping

mcp-bigquery-evals · the calling-card project

Open-source MCP server that lets agents query BigQuery in natural language with schema-aware grounding, cost guardrails, and a built-in eval harness so the behavior is measurable, not vibes-based. Sits at the intersection of three 2026 hot topics: MCP, evals, and NL-to-SQL.

From-scratch ML builds _{(in progress)}

Closing the gap from applied LLM engineer to ML / Research Engineer by rebuilding the modern stack from first principles. Each repo ships with the math, ablations, weights on Hugging Face, and a writeup.

Nano-LLM-from-scratch

PyTorch · RoPE · RMSNorm · SwiGLU · KV-cache

GPT-2 124M reproduction in clean PyTorch. Modern parts swapped in. Cost receipts in dollars and H100 hours, not vibes.

Triton-attention-kernels

Triton · CUDA · FlashAttention-style

Hand-written fused kernels for the transformer hot path (attention, RMSNorm, SwiGLU, RoPE), benchmarked against torch.SDPA.

Tiny-diffusion

DDPM · CFG · DDIM · UNet

Diffusion built from the forward process up. Math derived in the README, samples on CIFAR-10 and CelebA, FID against literature.

DPO-on-my-LLM

SFT · DPO · LLM-judge · TRL

Post-training stack: SFT on demonstrations, DPO on preferences, LLM-judge eval with win-rate and Wilson confidence intervals.

Selected production work

Conversational Research Agent

Python · FastAPI · LangChain

Multi-stage routing (chitchat / factual / research) with SSE streaming, session memory, idempotent retries, Pydantic-validated outputs, prompt-injection guardrails, and Prometheus metrics.

NL-to-SQL over BigQuery

Python · BigQuery · LLMs

Schema discovery, synonym matching, cost safety caps. Multi-provider routing with primary plus fallback. Built for non-technical operators to query the warehouse without writing SQL.

AI-Powered Inventory Platform

Python · Pandas · Scikit-learn · OpenAI

LLM-based invoice extraction, real-time stock alerts, demand forecasting, and a visualization dashboard for business insight. Shipped for a retail client during freelance work.

Clinical Chat Assistant

LangChain · FAISS · OpenAI

RAG over 500+ clinical PDFs with chunking, metadata filtering, and guardrails to reduce unsupported answers. Internal tool at Synclovis Systems.

GitHub at a glance

Experience

When	Role	Where
`2025.10 → now`	AI / ML Engineer	HypeOn AI · D2C trend prediction
`2024.10 → 2025.09`	Freelance ML / AI Engineer	Independent
`2024.06 → 2024.09`	Backend Developer Intern	Synclovis Systems
`2020 → 2024`	B.Tech, Computer Science	K.S.R.M College / JNTU Anantapur · CGPA 8.14

How I think


Tradeoffs over tools	Pick by constraint, not hype. Postgres + pgvector beats a managed vector DB until it doesn't.
Evals before scale	If you cannot measure it, you cannot improve it. A bad eval beats no eval.
Data quality over model swapping	A new model rarely fixes bad inputs. Retrieval and prompt structure compound.
Infrastructure is the product	Latency, cost, reliability are features users feel. The model is one component.
Ship narrow, then expand	One user, one workflow, working end-to-end. Tiny systems that ship beat grand systems that demo.
From-scratch when it teaches	Reach for the abstraction once, then go a layer deeper. The best engineers can drop a layer.

Reach out

Open to collaboration on production LLM systems, RAG pipelines, evals, ML systems / GPU performance, and applied AI infrastructure.

umarfarook-ai.vercel.app · LinkedIn · umarfarook0yt@gmail.com

_{built quietly · shipping noisily}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Umarfarook Umarfarook1

Achievements

Achievements

Block or report Umarfarook1

About

Stack

Currently shipping