Notes from the compute floor.

Real numbers. Real workloads. Real engineering. Lessons from running 312 GPUs and 142 production LLMs across India.

Why GPU clusters in India cost 60% less than AWS

A line-by-line breakdown of an 8× A100 cluster — compute, networking, egress, support — comparing AWS Mumbai to a Glixy rack two streets away. The numbers will surprise you.

VK Vikram Krishnan · 8 min read · Apr 28, 2026

LLM

Build your own private LLM in 14 days

From "we want our own AI" to "production traffic on Llama-3 70B" — the exact playbook we've used for 28 customers. Hardware, fine-tuning, RAG, eval, deploy.

AS Anjali Sharma · 12 min read · Apr 15, 2026

Architecture

RAG architecture: from PDFs to production

Hybrid retrieval, re-ranking, chunking strategies, eval metrics that actually matter. The mistakes we made in our first 5 RAG deployments — so you don't have to.

AS Anjali Sharma · 15 min read · Mar 30, 2026

DevOps

Kubernetes for GPU workloads — what nobody tells you

Why default K8s scheduling will starve your GPU jobs. How to use Kueue + NVIDIA device plugin for proper gang scheduling and fractional GPU sharing.

RM Ravi Mehta · 10 min read · Mar 12, 2026

Strategy

The Indian AI infrastructure opportunity

Why every Indian AI startup we know is overpaying for compute, and what changes when domestic capacity comes online. Market analysis with real customer data.

VK Vikram Krishnan · 14 min read · Feb 26, 2026

Get one engineering deep-dive a month — straight to your inbox.

No marketing fluff. Real engineering, real numbers, sometimes a CUDA war story.

Subscribe →