Deploying Large Language Models: vLLM and Quantization
Serving ML Model As An API — Sharing Our Experience