-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support tool calling for non-streaming chat completion in remote vLLM provider #1034
feat: Support tool calling for non-streaming chat completion in remote vLLM provider #1034
Conversation
Signed-off-by: Yuan Tang <[email protected]>
Will need to parse the response to be compatible since there seems to be an issue to parse the completion result that includes tool calls:
Error:
|
Signed-off-by: Yuan Tang <[email protected]>
Verified that all tests pass, including |
Signed-off-by: Yuan Tang <[email protected]>
@terrytangyuan It seems like |
I tested this PR from a container build that @terrytangyuan provided and can confirm that the tools portion of the payload is now passed to vLLM where it was not previously: {
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant with access to the following\nfunction calls. Your task is to produce a list of function calls\nnecessary to generate response to the user utterance. Use the following\nfunction calls as required."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What pods are in the namespace openshift-lightspeed?"
}
]
}
],
"model": "meta-llama/Llama-3.2-1B-Instruct",
"max_tokens": 4096,
"stream": true,
"temperature": 0.0,
"tools": [
{
"type": "function",
"function": {
"name": "get_object_namespace_list",
"description": "Get the list of all objects in a namespace",
"parameters": {
"type": "object",
"properties": {
"kind": {
"type": "str",
"description": "the type of object"
},
"namespace": {
"type": "str",
"description": "the name of the namespace"
}
},
"required": [
"kind",
"namespace"
]
}
}
}
]
} |
I tried but ran into another rabbit hole. Since the current fix works, should we merge it for now and investigate other issues later separately? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good , lets get this in.
…e vLLM provider (meta-llama#1034) # What does this PR do? This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider. ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 12 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 8%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 16%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 41%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] ``` --------- Signed-off-by: Yuan Tang <[email protected]>
What does this PR do?
This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider.
Test Plan