Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support tool calling for non-streaming chat completion in remote vLLM provider #1034

Merged
merged 4 commits into from
Feb 12, 2025

Conversation

terrytangyuan
Copy link
Collaborator

@terrytangyuan terrytangyuan commented Feb 10, 2025

What does this PR do?

This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider.

Test Plan

LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py
================================================================= test session starts =================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 12 items                                                                                                                                    

tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                  [  8%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                      [ 16%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED              [ 41%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED         [100%]

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 10, 2025
@terrytangyuan
Copy link
Collaborator Author

terrytangyuan commented Feb 10, 2025

Will need to parse the response to be compatible since there seems to be an issue to parse the completion result that includes tool calls:

Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-b1343807797b4b6aa972b186faa09620', function=Function(arguments='{"location": "San Francisco, CA"}', name='get_weather'), type='function')], reasoning_content=None), stop_reason=128008)

Error:


Traceback (most recent call last):
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 182, in endpoint
    return await maybe_await(value)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 148, in maybe_await
    return await value
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 91, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 169, in chat_completion
    return await provider.chat_completion(**params)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 91, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 138, in chat_completion
    return await self._nonstream_chat_completion(request, self.client)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 145, in _nonstream_chat_completion
    return process_chat_completion_response(r, self.formatter)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 178, in process_chat_completion_response
    raw_message = formatter.decode_assistant_message_from_content(
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/llama_models/llama3/api/chat_format.py", line 170, in decode_assistant_message_from_content
    content = content.strip(" ")
AttributeError: 'NoneType' object has no attribute 'strip'

Signed-off-by: Yuan Tang <[email protected]>
@terrytangyuan terrytangyuan changed the title fix: Handle tool calling in remote vLLM provider feat: Support tool calling for non-streaming chat completion in remote vLLM provider Feb 11, 2025
@terrytangyuan
Copy link
Collaborator Author

terrytangyuan commented Feb 11, 2025

Verified that all tests pass, including test_text_chat_completion_with_tool_calling_and_non_streaming. test_text_chat_completion_with_tool_calling_and_streaming does not pass since that will need to be worked on separately (created #1046 to track if anyone else is interested in working on it).

Signed-off-by: Yuan Tang <[email protected]>
@yanxi0830
Copy link
Contributor

Will need to parse the response to be compatible since there seems to be an issue to parse the completion result that includes tool calls:

Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-b1343807797b4b6aa972b186faa09620', function=Function(arguments='{"location": "San Francisco, CA"}', name='get_weather'), type='function')], reasoning_content=None), stop_reason=128008)

Error:


Traceback (most recent call last):
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 182, in endpoint
    return await maybe_await(value)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 148, in maybe_await
    return await value
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 91, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 169, in chat_completion
    return await provider.chat_completion(**params)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 91, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 138, in chat_completion
    return await self._nonstream_chat_completion(request, self.client)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 145, in _nonstream_chat_completion
    return process_chat_completion_response(r, self.formatter)
  File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 178, in process_chat_completion_response
    raw_message = formatter.decode_assistant_message_from_content(
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/llama_models/llama3/api/chat_format.py", line 170, in decode_assistant_message_from_content
    content = content.strip(" ")
AttributeError: 'NoneType' object has no attribute 'strip'

@terrytangyuan It seems like content=None is the culprit. What happens if you set content=""?

@thoraxe
Copy link

thoraxe commented Feb 11, 2025

I tested this PR from a container build that @terrytangyuan provided and can confirm that the tools portion of the payload is now passed to vLLM where it was not previously:

{
    "messages": [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a helpful assistant with access to the following\nfunction calls. Your task is to produce a list of function calls\nnecessary to generate response to the user utterance. Use the following\nfunction calls as required."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What pods are in the namespace openshift-lightspeed?"
                }
            ]
        }
    ],
    "model": "meta-llama/Llama-3.2-1B-Instruct",
    "max_tokens": 4096,
    "stream": true,
    "temperature": 0.0,
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_object_namespace_list",
                "description": "Get the list of all objects in a namespace",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "kind": {
                            "type": "str",
                            "description": "the type of object"
                        },
                        "namespace": {
                            "type": "str",
                            "description": "the name of the namespace"
                        }
                    },
                    "required": [
                        "kind",
                        "namespace"
                    ]
                }
            }
        }
    ]
}

@terrytangyuan
Copy link
Collaborator Author

terrytangyuan commented Feb 11, 2025

@terrytangyuan It seems like content=None is the culprit. What happens if you set content=""?

I tried but ran into another rabbit hole. Since the current fix works, should we merge it for now and investigate other issues later separately?

Copy link
Contributor

@hardikjshah hardikjshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good , lets get this in.

@terrytangyuan terrytangyuan merged commit dd37e58 into meta-llama:main Feb 12, 2025
3 checks passed
@terrytangyuan terrytangyuan deleted the fix-tool-calling-vllm branch February 12, 2025 02:08
srikanthbachala20 pushed a commit to srikanthbachala20/llama-stack that referenced this pull request Feb 27, 2025
…e vLLM provider (meta-llama#1034)

# What does this PR do?


This PR adds support for tool calling for non-streaming chat completion.
Prior to this, tool calls were not passed to chat completion requests
and the tools object needs to be restructured properly to be compatible
with vLLM provider.

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py
================================================================= test session starts =================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 12 items                                                                                                                                    

tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                  [  8%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                      [ 16%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED              [ 41%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED         [100%]

```

---------

Signed-off-by: Yuan Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants