Piggybacking more information in response header #795

gangmuk · 2025-03-05T06:31:25Z

🚀 Feature Description and Motivation

Suggesting piggybacking more information in the header on the response. For example, currently gateway is returning the target-pod-ip on the response header. I suggest including more information in this manner on the response.
It would be useful since we can get snapshot information at the per request level when the request is scheduled. Request granularity information will be very useful for post-analysis and more.

Candidate state information would be queue size, GPU memory utilization, KV cache hit ratio, RPS for each GPU, TPS for each GPU, etc. The exact list should be discussed. The requirement is that none of them shouldn't introduce overhead on request critical path.

Downside/overhead of including more information on the response header would be overhead in the gateway and the request size gets bigger. Neither is significant.

Use Case

post-analysis

Proposed Solution

piggybacking more information on the response header

gangmuk · 2025-03-05T06:31:45Z

@Jeffwan @varungup90 WDYH?

varungup90 · 2025-03-05T06:53:18Z

Only per request level information should be returned in response headers. The information listed in the issue is captured in the metrics which is reflected in dashboard and is queryable as well by client.

gangmuk · 2025-03-05T07:17:35Z

Yeah that's true. but if we want to map the state to a particular request that it was scheduled, snapshot is needed. I wonder what would be downside of it. any thoughts?

varungup90 · 2025-03-05T18:29:20Z

Request and response headers must be light weight. You can dump the state in logs for per request basis.

gangmuk · 2025-03-06T01:41:51Z

@varungup90 @Jeffwan
Not sure you've heard of it. but in envoy, there was similar discussion in the past. They proposed ORCA. It is a proposal for an open standard for request cost aggregation. I think it was integrated into envoy officially. We don't need to follow the exact format but we can maybe think about so-called AIBrix ORCA things for AI specific metrics and support them in AIBrix.

orca issue in envoy
orca design doc

gangmuk added area/gateway area/benchmark labels Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Piggybacking more information in response header #795

Piggybacking more information in response header #795

gangmuk commented Mar 5, 2025

gangmuk commented Mar 5, 2025

varungup90 commented Mar 5, 2025

gangmuk commented Mar 5, 2025

varungup90 commented Mar 5, 2025

gangmuk commented Mar 6, 2025 •

edited

Loading

Piggybacking more information in response header #795

Piggybacking more information in response header #795

Comments

gangmuk commented Mar 5, 2025

🚀 Feature Description and Motivation

Use Case

Proposed Solution

gangmuk commented Mar 5, 2025

varungup90 commented Mar 5, 2025

gangmuk commented Mar 5, 2025

varungup90 commented Mar 5, 2025

gangmuk commented Mar 6, 2025 • edited Loading

gangmuk commented Mar 6, 2025 •

edited

Loading