-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIF Panic when reading parquet files from S3 #1054
Comments
Can you execute any other operation? If nothing works, then it is most likely incompatible gcc/musl versions, you can check the README information on precompilation: https://github.com/elixir-explorer/explorer?tab=readme-ov-file#precompilation |
I tested two different operations, with one succeeding and one resulting in the same NIF panic: First test was the example used in #1011 Mix.install([{:explorer, "~> 0.10.0"}])
name_dtype = {"names",
{:list,
{:struct,
[
{"language", :string},
{"name", :string},
{"transliteration", :category},
{"type", :category}
]}}}
[
%{names: []},
%{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
]
|> Explorer.DataFrame.new(dtypes: [name_dtype])
|> dbg Which resulted in a NIF panic:
The second operation I tried was creating a simple dataframe and that succeeded: df = Explorer.DataFrame.new(%{
"id" => ["a", "b", "c"],
"type" => ["x", "y", "z"]
}) Output:
I'm deploying using Mix releases and Docker with the base image being I also verified that during the build processing I'm correctly downloading the precompiled NIF:
|
@billylanchantin No problem! My use case is that I'm currently trying to read a parquet file straight from s3 that has 4 columns
I see that Explorer is able to pull it down and see the different columns but doesn't make it past that. I can send over a sample file if that's helpful! |
Yeah that'd be great, thanks! |
@billylanchantin github won't let me upload parquet files here, cool if I DM you on the Elixir slack? |
Just sent it via Slack. Let me know if you prefer something else and I can upload to google drive! |
Ok I got the file and some more info off slack.
So they're technically unblocked right now since they can use As a sanity check, I ran our @tag :cloud_integration
test "reads rohfosho's parquet file from S3" do
config = %FSS.S3.Config{
access_key_id: "test",
secret_access_key: "test",
endpoint: "http://localhost:4566",
region: "us-east-1"
}
assert {:ok, df} =
DF.from_parquet("s3://test-bucket/rohfosho.parquet",
config: config,
)
df |> DF.print()
end which passed. Must be something more specific, IDK yet. @josevalim any ideas? |
Hey, I suspect that this may be some library missing inside the container. Can you print the result of the following command?
Where this path is printed right after you install Explorer - there is a small bug though: the path should not end with |
@philss sure! here you go
|
@rohfosho thank you for the info! I think that's nothing wrong there, based on what you sent. Would you mind to share the Dockerfile, or the full base image tag that you are using? It may be easier to reproduce. |
@philss for sure, here you go: # syntax = docker/dockerfile:1.2
# Use the official Elixir image as the base image
FROM elixir:1.18.1 AS builder
# Set the working directory inside the container
WORKDIR /app
# Install required system dependencies
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
build-essential \
git \
nodejs \
npm \
postgresql-client \
python3
RUN mix local.hex --force && \
mix local.rebar --force
# Copy the mix files first for better docker build caching
COPY mix.exs ./
COPY mix.lock ./
RUN mix deps.get
# Compile the dependencies, set MIX_ENV beforehand
ARG MIX_ENV=prod
ENV MIX_ENV=${MIX_ENV}
RUN mix deps.compile
# Now copy the whole project to avoid rebuilding the deps when the source code changes
COPY . .
# Set permissions for the release script
RUN chmod +x release.sh
# Source Code Compilation Stage
FROM builder AS compiler
# Set the working directory
WORKDIR /app
# Execute release script
RUN --mount=type=secret,id=_env,dst=/etc/secrets/.env ./release.sh
# New stage for the runtime image to reduce the final size
FROM elixir:1.18.1
# Set the working directory
WORKDIR /app
# Install minimal dependencies
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
nodejs \
npm \
postgresql-client \
python3
COPY --from=compiler /app/ .
# Expose the port the application will run on
EXPOSE 4000
# Define the entrypoint for the application
ENTRYPOINT ["/app/_build/${MIX_ENV}/rel/my_app/bin/my_app"]
# Start the Phoenix application
CMD ["start"] |
@rohfosho sorry for the delay. I built a container image and ran the code, but I couldn't reproduce the problem. I'm running in a Linux environment (Fedora 41 - w/ Podman). If you don't mind, can you run your code with the Another shot would be to try to compile from source, from our main branch, and see if the problem persists. We updated Polars recently, so it may be working. |
Code
Expected
A working DataFrame
Actual
Note: When I try to load the parquet file lazily, I get a more detailed stacktrace:
Context
This only happens when I deploy to staging or prod (using Docker with the base being the
elixir-1.18.1
image & mix releases). It works perfectly when I'm developing locally (on Mac OS)The text was updated successfully, but these errors were encountered: