Aksel with AI

Aksel with AI https://www.akselwithai.xyz/ 2026-04-17T12:00:00.000Z Aksel Aghajanyan build-rss https://www.akselwithai.xyz/#gpu-memory GPU Memory Bottlenecks in Large Language Model Inference: Understanding the Real Limits of Real-Time AI 2026-04-17T12:00:00.000Z 2026-04-17T12:00:00.000Z

KV cache, memory bandwidth vs capacity, and mitigation strategies for real-time LLM serving — full text below; PDF still available.

https://www.akselwithai.xyz/#serving-notes Serving LLMs in Production: Latency, Batching, and a Few Lines of Python 2026-04-10T12:00:00.000Z 2026-04-10T12:00:00.000Z

Part 2 of the inference mini-series: practical knobs for batching, streaming, and measuring what users actually feel.