Real numbers. Real workloads. Real engineering. Lessons from running 312 GPUs and 142 production LLMs across India.
A line-by-line breakdown of an 8× A100 cluster — compute, networking, egress, support — comparing AWS Mumbai to a Glixy rack two streets away. The numbers will surprise you.
LLMFrom "we want our own AI" to "production traffic on Llama-3 70B" — the exact playbook we've used for 28 customers. Hardware, fine-tuning, RAG, eval, deploy.
ArchitectureHybrid retrieval, re-ranking, chunking strategies, eval metrics that actually matter. The mistakes we made in our first 5 RAG deployments — so you don't have to.
DevOpsWhy default K8s scheduling will starve your GPU jobs. How to use Kueue + NVIDIA device plugin for proper gang scheduling and fractional GPU sharing.
StrategyWhy every Indian AI startup we know is overpaying for compute, and what changes when domestic capacity comes online. Market analysis with real customer data.
SubscribeNo marketing fluff. Real engineering, real numbers, sometimes a CUDA war story.