Paper week 0002
Paper of the Week # 2
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
https://arxiv.org/abs/2407.00079
Improving inference at scale using DRAM + SSD + vCPUs by offloading operations and Improving inference compared to vLLM inference framework.
At dwani - we use vLLM to process requests for text / Image and documents.
-Reference - https://github.com/kvcache-ai/Mooncake