Skip to content

Commit

Permalink
Add feature description for heterogeneous gpu inference feature (#707)
Browse files Browse the repository at this point in the history
Signed-off-by: Varun Gupta <[email protected]>
  • Loading branch information
nwangfw authored and varungup90 committed Feb 20, 2025
1 parent 09d6142 commit a612d96
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The initial release includes the following key features:
- **LLM App-Tailored Autoscaler**: Dynamically scale inference resources based on real-time demand.
- **Unified AI Runtime**: A versatile sidecar enabling metric standardization, model downloading, and management.
- **Distributed KV Cache**: Enables high-capacity, cross-engine KV reuse.
- **Cost-efficient Heterogeneous Serving**: Enables mixed GPU inference to reduce costs with SLO guarantees.
- **GPU Hardware Failure Detection (TBD)**: Proactive detection of GPU hardware issues.

## Architecture
Expand Down

0 comments on commit a612d96

Please sign in to comment.