[Feature Request] Support input embedding in `LLM.generate()` #416

KimmiShi · 2023-07-10T08:28:50Z

Hi, I am using llm as part of a multimodal model, so the model needs to pass input embedding tensor directly to generate, and also need to access the language model's embed_tokens member to fist calculate the embedding, and then processed, finnaly send to generate, demo in the following code :

        inputs_embeds = self.language_model.get_input_embeddings()(input_ids)

        prefix_embeds = inputs_embeds[:, :self.offset, :]
        postfix_embeds = inputs_embeds[:, self.offset:, :]
        inputs_embeds = torch.cat([prefix_embeds, language_model_inputs, postfix_embeds], dim=1)

        .....
        attention_mask = torch.cat([prefix_mask, vision_mask, postfix_mask], dim=-1)

        outputs = self.language_model.generate(
            inputs_embeds=inputs_embeds,
            attention_mask=attention_mask,
            generation_config=generation_config,
            **generate_kwargs,
        )

I read the vllm code, and it seems that I need to add two interfaces in vllm, one is LLM.get_input_embeddings, another one is LLM.generate(inputs_embeds=inputs_embeds, ...)

Do you think this will work? And would you consider support this feature?

The text was updated successfully, but these errors were encountered:

KimmiShi · 2023-07-11T09:52:26Z

It seems that worker._prepare_inputs method need to be modified to support embedding tensor input, can you support this feature?

hangzhang-nlp · 2023-07-18T02:43:54Z

Awesome work！！ And I have the same need.

zacharyblank · 2023-09-05T22:17:11Z

Has there been any progress on this? I am looking to achieve something very similar: essentially i need to be able to pass in a previously calculated embedding as to not have to recalculate it as part of a common prompt. I have somewhere between 4 - 12k tokens that are currently being reprocessed many times for a single request due to my use case.

hmellor · 2024-03-25T10:54:18Z

@WoosukKwon is this on the roadmap?

Andcircle · 2024-07-15T20:19:36Z

what is current status on it =)
Also need this feature

hmellor · 2024-08-02T17:46:32Z

According to #1265 (comment) this feature was added in #3042

Andcircle · 2024-08-03T05:20:30Z

@hmellor
#3042 is not the feature we actually expected,

Is it possible we can add feature like:

llm = LLM(model="mistral", ...)
inputs_embeds = merge_inputs(texts, images)
#merge_inputs is a customized function, provided by user, this will make the process more flexible
outputs = llm.generate(inputs_embeds=inputs_embeds, ...)```

AnyangAngus · 2024-08-04T11:14:31Z

@hmellor #3042 is not the feature we actually expected,

Is it possible we can add feature like:

llm = LLM(model="mistral", ...)
inputs_embeds = merge_inputs(texts, images)
#merge_inputs is a customized function, provided by user, this will make the process more flexible
outputs = llm.generate(inputs_embeds=inputs_embeds, ...)```

yeah, same request, input embedding in the llm.generate() function may be straightforward

DarkLight1337 · 2024-08-22T13:45:41Z

FYI this is now supported for multi-modal models via #6613. Perhaps a similar idea could be used to extend this to language-only models.

Andcircle · 2024-08-22T17:34:08Z

FYI this is now supported for multi-modal models via #6613. Perhaps a similar idea could be used to extend this to language-only models.

Thanks for @DarkLight1337, thanks for the updates.
I checked the demo code, we still provide 2 modality separately, prompt and images, and the merge process is still controlled by only the VLLM supported VLM model, it is not that flexible if we wanna our own merge methods.

Can we do as following: so we just take customized VLM as a PURE language model

#start from pure language model, NOT existing VLM
llm = LLM(model="mistral", ...)
#merge_inputs is a customized function, provided by user, this will make the process more flexible
inputs_embeds = merge_inputs(texts, images)
#the LLM only takes batch of merged embeddings, it doesn't care is it image / video / audio anymore, it just take it as pure language model
outputs = llm.generate(inputs_embeds=inputs_embeds, ...)

I saw you also mentioned:
Follow-up TODO: Support initializing VLM with only language model backbone.
Is this as above mentioned?

Again, really really appreciated your help.

fzyzcjy · 2024-11-16T04:01:03Z

Hi, is there any updates? Thanks!

DarkLight1337 · 2024-11-16T05:26:31Z

Please refer to that PR for more info.

v4if · 2024-11-21T12:24:28Z

any update?

DaoD · 2024-11-25T11:28:56Z

same request

lyblsgo · 2024-12-03T06:28:37Z

any update?

sidhartha-roy · 2025-02-19T17:20:27Z

is there an update on this?

KimmiShi changed the title ~~Accept input embedding in generate(..)~~ Accept input embedding in LLM.generate() Jul 10, 2023

KimmiShi changed the title ~~Accept input embedding in LLM.generate()~~ [Feature Request] Support input embedding in LLM.generate() Jul 11, 2023

WoosukKwon added the feature request New feature or request label Jul 13, 2023

pfldy2850 mentioned this issue Oct 5, 2023

Support generation from input embedding #1265

Closed

4 tasks

This comment was marked as resolved.

Sign in to view

hmellor mentioned this issue Mar 20, 2024

does vicuna support embedding input? #369

Closed

DarkLight1337 mentioned this issue May 9, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

46 tasks

ywang96 mentioned this issue Jul 5, 2024

[New Model]: LLaVA-NeXT-Video support #5124

Closed

AlekseyKorshuk mentioned this issue Jul 26, 2024

[Feature] Generation Inputs: input_embeds sgl-project/sglang#745

Open

Nan2018 linked a pull request Jul 27, 2024 that will close this issue

[Core] generate from input embeds #6869

Open

hmellor closed this as completed Aug 2, 2024

hmellor reopened this Aug 3, 2024

DarkLight1337 mentioned this issue Sep 10, 2024

Do vLLM support input_embeds as input while using LLama? #8323

Open

Bryce1010 linked a pull request Jan 2, 2025 that will close this issue

[Bugfix] add input embedding #11684

Open

groenenboomj pushed a commit to opendatahub-io/vllm that referenced this issue Feb 27, 2025

updating manfiest (vllm-project#416)

46476bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support input embedding in `LLM.generate()` #416

[Feature Request] Support input embedding in `LLM.generate()` #416

KimmiShi commented Jul 10, 2023 •

edited

Loading

KimmiShi commented Jul 11, 2023

hangzhang-nlp commented Jul 18, 2023

zacharyblank commented Sep 5, 2023

This comment was marked as resolved.

hmellor commented Mar 25, 2024

Andcircle commented Jul 15, 2024

hmellor commented Aug 2, 2024

Andcircle commented Aug 3, 2024

AnyangAngus commented Aug 4, 2024

DarkLight1337 commented Aug 22, 2024 •

edited

Loading

Andcircle commented Aug 22, 2024 •

edited

Loading

fzyzcjy commented Nov 16, 2024

DarkLight1337 commented Nov 16, 2024

v4if commented Nov 21, 2024

DaoD commented Nov 25, 2024

lyblsgo commented Dec 3, 2024

sidhartha-roy commented Feb 19, 2025

[Feature Request] Support input embedding in LLM.generate() #416

[Feature Request] Support input embedding in LLM.generate() #416

Comments

KimmiShi commented Jul 10, 2023 • edited Loading

KimmiShi commented Jul 11, 2023

hangzhang-nlp commented Jul 18, 2023

zacharyblank commented Sep 5, 2023

This comment was marked as resolved.

hmellor commented Mar 25, 2024

Andcircle commented Jul 15, 2024

hmellor commented Aug 2, 2024

Andcircle commented Aug 3, 2024

AnyangAngus commented Aug 4, 2024

DarkLight1337 commented Aug 22, 2024 • edited Loading

Andcircle commented Aug 22, 2024 • edited Loading

fzyzcjy commented Nov 16, 2024

DarkLight1337 commented Nov 16, 2024

v4if commented Nov 21, 2024

DaoD commented Nov 25, 2024

lyblsgo commented Dec 3, 2024

sidhartha-roy commented Feb 19, 2025

[Feature Request] Support input embedding in `LLM.generate()` #416

[Feature Request] Support input embedding in `LLM.generate()` #416

KimmiShi commented Jul 10, 2023 •

edited

Loading

DarkLight1337 commented Aug 22, 2024 •

edited

Loading

Andcircle commented Aug 22, 2024 •

edited

Loading