-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support input embedding in LLM.generate()
#416
Comments
generate(..)
LLM.generate()
It seems that |
LLM.generate()
LLM.generate()
Awesome work!! And I have the same need. |
Has there been any progress on this? I am looking to achieve something very similar: essentially i need to be able to pass in a previously calculated embedding as to not have to recalculate it as part of a common prompt. I have somewhere between 4 - 12k tokens that are currently being reprocessed many times for a single request due to my use case. |
This comment was marked as resolved.
This comment was marked as resolved.
@WoosukKwon is this on the roadmap? |
what is current status on it =) |
According to #1265 (comment) this feature was added in #3042 |
@hmellor Is it possible we can add feature like:
|
yeah, same request, input embedding in the llm.generate() function may be straightforward |
FYI this is now supported for multi-modal models via #6613. Perhaps a similar idea could be used to extend this to language-only models. |
Thanks for @DarkLight1337, thanks for the updates. Can we do as following: so we just take customized VLM as a PURE language model
I saw you also mentioned: Again, really really appreciated your help. |
Hi, is there any updates? Thanks! |
Please refer to that PR for more info. |
any update? |
same request |
any update? |
is there an update on this? |
Hi, I am using llm as part of a multimodal model, so the model needs to pass
input embedding tensor
directly to generate, and also need to access the language model'sembed_tokens
member to fist calculate the embedding, and then processed, finnaly send togenerate
, demo in the following code :I read the vllm code, and it seems that I need to add two interfaces in vllm, one is
LLM.get_input_embeddings
, another one isLLM.generate(inputs_embeds=inputs_embeds, ...)
Do you think this will work? And would you consider support this feature?
The text was updated successfully, but these errors were encountered: