Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: File Access Error with vllm using runai_streamer on OCP #193

Open
TamKez opened this issue Feb 27, 2025 · 5 comments
Open

bug: File Access Error with vllm using runai_streamer on OCP #193

TamKez opened this issue Feb 27, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@TamKez
Copy link

TamKez commented Feb 27, 2025

Describe the bug

I am attempting to run the vLLM production stack on OpenShift (OCP) while fetching the model from a Dell ECS S3-compatible storage using runai_streamer. However, I consistently encounter the following error:
Could not send runai_request to libstreamer due to: b'file access error'
All necessary environment variables related to AWS credentials and configurations are set, along with all recommended RUNAI_STREAMER environment variables. Despite debugging the run, this is the only error message I receive.

To Reproduce

Deploy vLLM using the image versions 0.7.3 or 0.6.6 and the production stack on OCP, to deploy a single model defined in the modelspec.
Configure model url to the bucket path in the Dell ECS S3-compatible storage - s3://bucket-name (the error occurred with and withont the url ending with a /)
Using an added external secret, define all the env vars in a Vault service, and mount them to the pod.
Start the process and observe the error

Expected behavior

The model should load successfully from the storage without file access errors, as the credentials are correct, and all the env vars defined.

Additional context

I also get the following logs:

Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
Using XFormers backend.
Starting to load model /tmp/tmobytw07_d

No response

@TamKez TamKez added the bug Something isn't working label Feb 27, 2025
@noa-neria
Copy link

noa-neria commented Feb 27, 2025

In 0.6.6 the only path pattern that works is s3://bucket/dir/ so please use 0.7.3

The runai streamer is using the AWS C++ SDK, which does not automatically fetch credentials from AWS Vault.

What you can do is use AWS Vault to get temporary credentials as follows:
aws-vault exec my-profile -- json > creds.json
Then, pass the results as environment variables:
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN.

We are currently working to make the authentication similar to boto3, so in the future this will be handled by the streamer.

@TamKez
Copy link
Author

TamKez commented Mar 2, 2025

I tried to use also version 0.7.3, with similar results.
I have also exported all the env vars you mentioned, as well as those mentioned in the documentation for importing from s3 compatible storage with runai_streamer

@noa-neria
Copy link

noa-neria commented Mar 2, 2025

Checking AWS logs should help understanding the authentication problem.
Add the environment variable RUNAI_STREAMER_S3_TRACE=1 as explained here
Trace logs are written into a file in the location of the executable

In addition, there are the streamer internal logs - RUNAI_STREAMER_LOG_TO_STDERR=1 RUNAI_STREAMER_LOG_LEVEL=DEBUG

@TamKez
Copy link
Author

TamKez commented Mar 4, 2025

After adding those env vars I get the following log:
ValueError: No supported config format found in /tmp/tmpcn6hkvza.

@noa-neria
Copy link

noa-neria commented Mar 4, 2025

This error is probably since the configuration files were not downloaded. Config files are downloaded in vLLM using boto3, so please verify that you can download a file from your bucket with boto3.
e.g. that could be a signature version issue as explained here. If your Dell ECS S3 version supports only S3 signature version 2 this will not work, since it's deprecated in AWS sdks. The suggested workaround may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants