vLLM and LMCache: Notes on Optimizing LLM Inference

My notes on using vLLM and LMCache to optimize memory usage and throughput when serving LLMs on A100 80GB GPUs
vLLM
LMCache
LLM Inference
KV Cache
Author

Bhabishya Neupane

Published

January 3, 2026