KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Devs

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Read Paper on Bytez