Paper week 0002

Paper of the Week # 2

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

https://arxiv.org/abs/2407.00079

Improving inference at scale using DRAM + SSD + vCPUs by offloading operations and Improving inference compared to vLLM inference framework.

At dwani - we use vLLM to process requests for text / Image and documents.

-Reference - https://github.com/kvcache-ai/Mooncake