V1 integration #6

joerunde · 2025-02-28T23:15:09Z

@tjohnson31415 and I spent a while hacking through a new V1-compatible worker , runner, and scheduler. This works on an AIU chip! 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉

The main blockers here are:

The scheduler is not yet pluggable (Woosuk is working on it now https://github.com/vllm-project/vllm/pull/12544/files)
It doesn't look like there's support for ignoring requests when they first come into the scheduler. (The V1 engine always tries to run them anyway)

We'll push up a draft PR with some temporary changes for vLLM for the above blockers so we can at least move development forward here on v1

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-02-28T23:15:22Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

sducouedic · 2025-03-03T09:46:08Z

The error that you report on AIU with V1, I think Dhruval already observed that with the image built on Friday 28.02, meaning V0.

joerunde · 2025-03-03T22:56:22Z

Interesting, I can confirm though that with this code and vllm@cd1d3c3d the v0 engine works and serves requests using the sendnn_decoder backend

joerunde · 2025-03-04T16:37:10Z

We found the problem, and it was us!

An erroneous @torch.inference_mode was causing extra compilation at runtime (instead of just during warmup) which was failing. Without that, V1 is working on the AIU chip!

sducouedic · 2025-03-04T16:48:41Z

@joerunde nice! Where was this @torch.inference_mode set?

joerunde · 2025-03-04T17:14:22Z

@sducouedic we had copied over code from the v1 gpu worker here in this PR for the new execute_model interface, and accidentally included the annotation along with it 🤦

This bit here

sducouedic · 2025-03-04T17:16:25Z

thanks @joerunde !

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-03-05T23:26:47Z

Alright, the v0 tests are all passing and the linter's happy. I removed the [Do not merge] tag, I think we can consider merging this since it wont' break v0 support and we can iterate on v1 in our development environments

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-03-06T13:46:10Z

vllm_spyre/v1/worker/spyre_model_runner.py

+        dummy_tensors = lambda v: torch.full(
+            (num_reqs, ), v, device=self.device)
+        dummy_metadata = SamplingMetadata(
+            temperature=dummy_tensors(0.0),


Updated the hard coded sampling temperature to 0.0 to do greedy sampling as in most of our test cases.

yannicks1 · 2025-03-06T13:48:07Z

vllm_spyre/v1/worker/spyre_model_runner.py

+            # seq_lens = []
+            num_reqs = len(scheduler_output.scheduled_cached_reqs)
+
+        # TODO: Cache the sampling params for the current batch and build this


I believe there is some upstream implementation still missing for V1 to correctly fill in the sampling metadata here. So I would say that's okay for now.

yannicks1 · 2025-03-06T14:20:42Z

vllm_spyre/v1/worker/spyre_model_runner.py

+            sampled_token_ids=output.sampled_token_ids.tolist(),
+            spec_token_ids=None,
+            logprobs=
+            None,  # TODO: add logprobs, needs to be converted from tensor here


Is there something hindering us from providing the correct logprobs here? Asking because this is needed to run the tests in vllm-spyre/tests...

yannicks1 · 2025-03-06T14:24:02Z

vllm_spyre/v1/worker/spyre_model_runner.py

+        print("\n\n\n FINISHED ITERATION \n\n\n")
+        print(self._req_ids2idx)
+        print(output.sampled_token_ids)
+        print("\n\n")


I believe that was for debugging purposes and can be removed.

yannicks1 · 2025-03-06T14:26:35Z

vllm_spyre/v1/worker/spyre_worker.py

+        # TODO See if we can use `self.execute_model` instead for the warmup
+        # It's slightly risky to implement different forward pass logic here,
+        # which can go out of sync with the real forward pass and cause problems
+        # for torch.compile


good point!

joerunde added 2 commits February 26, 2025 17:14

🐛 handle rename to LoRANotSupportedWorkerBase

8e93bf6

Signed-off-by: Joe Runde <[email protected]>

⚗️ Add V1 draft

e22f45d

Signed-off-by: Joe Runde <[email protected]>

joerunde added 9 commits March 4, 2025 11:14

🎨 some v1 cleanup

89f6ac2

Signed-off-by: Joe Runde <[email protected]>

🎨 slap some TODOs in the worker

094e92a

Signed-off-by: Joe Runde <[email protected]>

✨ implement slim v1 scheduler

c34e438

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into v1-draft

734a88e

🎨 fmt

ea867a5

Signed-off-by: Joe Runde <[email protected]>

🎨 undo yapf upgrade

9516296

Signed-off-by: Joe Runde <[email protected]>

🎨 more fmt

8aca7b3

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into v1-draft

6bf8ede

🎨 disable yapf where it conflicts with isort

4aadd07

Signed-off-by: Joe Runde <[email protected]>

joerunde changed the title ~~[Do not merge] V1 integration~~ V1 integration Mar 5, 2025

yannicks1 self-requested a review March 6, 2025 09:48

set (hard coded) sampling temperature to 0.0 for greedy decoding

7b95256

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 reviewed Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1 integration #6

V1 integration #6

joerunde commented Feb 28, 2025 •

edited

Loading

github-actions bot commented Feb 28, 2025

sducouedic commented Mar 3, 2025 •

edited

Loading

joerunde commented Mar 3, 2025 •

edited

Loading

joerunde commented Mar 4, 2025

sducouedic commented Mar 4, 2025

joerunde commented Mar 4, 2025

sducouedic commented Mar 4, 2025

joerunde commented Mar 5, 2025

yannicks1 Mar 6, 2025

yannicks1 Mar 6, 2025

yannicks1 Mar 6, 2025

yannicks1 Mar 6, 2025

yannicks1 Mar 6, 2025

V1 integration #6

Are you sure you want to change the base?

V1 integration #6

Conversation

joerunde commented Feb 28, 2025 • edited Loading

github-actions bot commented Feb 28, 2025

sducouedic commented Mar 3, 2025 • edited Loading

joerunde commented Mar 3, 2025 • edited Loading

joerunde commented Mar 4, 2025

sducouedic commented Mar 4, 2025

joerunde commented Mar 4, 2025

sducouedic commented Mar 4, 2025

joerunde commented Mar 5, 2025

yannicks1 Mar 6, 2025

Choose a reason for hiding this comment

yannicks1 Mar 6, 2025

Choose a reason for hiding this comment

yannicks1 Mar 6, 2025

Choose a reason for hiding this comment

yannicks1 Mar 6, 2025

Choose a reason for hiding this comment

yannicks1 Mar 6, 2025

Choose a reason for hiding this comment

joerunde commented Feb 28, 2025 •

edited

Loading

sducouedic commented Mar 3, 2025 •

edited

Loading

joerunde commented Mar 3, 2025 •

edited

Loading