Add feature description for heterogeneous gpu inference feature (#707)

Signed-off-by: Varun Gupta <[email protected]>
vllm-project · Feb 20, 2025 · a612d96 · a612d96
1 parent 09d6142
commit a612d96
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/README.md b/README.md
@@ -12,6 +12,7 @@ The initial release includes the following key features:
 - **LLM App-Tailored Autoscaler**: Dynamically scale inference resources based on real-time demand.
 - **Unified AI Runtime**: A versatile sidecar enabling metric standardization, model downloading, and management.
 - **Distributed KV Cache**: Enables high-capacity, cross-engine KV reuse.
+- **Cost-efficient Heterogeneous Serving**: Enables mixed GPU inference to reduce costs with SLO guarantees.
 - **GPU Hardware Failure Detection (TBD)**: Proactive detection of GPU hardware issues.
 
 ## Architecture