-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Support pooling #229
Conversation
Test
Seems we need to add the support of encoder self-attention and encoder/decoder cross-attention. |
|
e50e772
to
e682021
Compare
Signed-off-by: wangxiyuan <[email protected]>
This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) outputs = model.encode(prompts) for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: vllm-project#200 The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <[email protected]>
This PR added pooling support for vllm-ascend Tested with `bge-base-en-v1.5` by encode: ``` from vllm import LLM prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] model = LLM(model="./bge-base-en-v1.5", enforce_eager=True) outputs = model.encode(prompts) for output in outputs: print(output.outputs.embedding) # list of 4096 floats ``` Tested by embedding: ``` from vllm import LLM, SamplingParams llm = LLM(model="./bge-base-en-v1.5", task="embed") (output,) = llm.embed("Hello, my name is") embeds = output.outputs.embedding print(f"Embeddings: {embeds!r} (size={len(embeds)})") ``` Related: vllm-project#200 The accuracy is not correct since this feature rely on `enc-dec` support. It'll be done in the following PR by @MengqingCao Signed-off-by: wangxiyuan <[email protected]>
This PR added pooling support for vllm-ascend
Tested with
bge-base-en-v1.5
by encode:Tested by embedding:
Related: #200 #235
Known issue
The accuracy is not correct since this feature rely on
enc-dec
support. It'll be done in the following PR by @MengqingCao