Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIF Panic when reading parquet files from S3 #1054

Open
rohfosho opened this issue Jan 14, 2025 · 13 comments
Open

NIF Panic when reading parquet files from S3 #1054

rohfosho opened this issue Jan 14, 2025 · 13 comments

Comments

@rohfosho
Copy link

Code

Explorer.DataFrame.from_parquet("s3://path/to/file.parquet", config: %{FSS.S3.config_from_system_env() | region: "us-west-2"})

Expected

A working DataFrame

Actual

** (ErlangError) Erlang error: :nif_panicked
   (explorer 0.10.1) Explorer.PolarsBackend.Native.lf_compute(%Explorer.PolarsBackend.LazyFrame{resource: #Reference<0.1199058728.3890348049.170513>})
   (explorer 0.10.1) lib/explorer/polars_backend/data_frame.ex:286: Explorer.PolarsBackend.DataFrame.from_parquet/4
   iex:1: (file)

Note: When I try to load the parquet file lazily, I get a more detailed stacktrace:

Explorer.DataFrame.from_parquet("s3://path/to/file.parquet", config: %{FSS.S3.config_from_system_env() | region: "us-west-2"}, lazy: true)
#Inspect.Error<
  got ErlangError with message:

      """
      Erlang error: :nif_panicked
      """

  while inspecting:

      %{
        data: %Explorer.PolarsBackend.LazyFrame{
          resource: #Reference<0.3078565455.1145176087.107417>
        },
        remote: nil,
        names: ["id", "point", "rarity", "type"],
        struct: Explorer.DataFrame,
        groups: [],
        dtypes: %{
          "id" => :string,
          "point" => :string,
          "rarity" => :string,
          "type" => :string
        }
      }

  Stacktrace:

    (explorer 0.10.1) Explorer.PolarsBackend.Native.lf_fetch(%Explorer.PolarsBackend.LazyFrame{resource: #Reference<0.3078565455.1145176087.107417>}, 50)
    (explorer 0.10.1) lib/explorer/polars_backend/lazy_frame.ex:74: Explorer.PolarsBackend.LazyFrame.inspect/2
    (explorer 0.10.1) lib/explorer/data_frame.ex:6379: Inspect.Explorer.DataFrame.inspect/2
    (elixir 1.16.1) lib/inspect/algebra.ex:347: Inspect.Algebra.to_doc/2
    (elixir 1.16.1) lib/kernel.ex:2351: Kernel.inspect/2
    (iex 1.16.1) lib/iex/evaluator.ex:376: IEx.Evaluator.io_inspect/1
    (iex 1.16.1) lib/iex/evaluator.ex:335: IEx.Evaluator.eval_and_inspect/3
    (iex 1.16.1) lib/iex/evaluator.ex:306: IEx.Evaluator.eval_and_inspect_parsed/3

>

Context

This only happens when I deploy to staging or prod (using Docker with the base being the elixir-1.18.1 image & mix releases). It works perfectly when I'm developing locally (on Mac OS)

@josevalim
Copy link
Member

Can you execute any other operation? If nothing works, then it is most likely incompatible gcc/musl versions, you can check the README information on precompilation: https://github.com/elixir-explorer/explorer?tab=readme-ov-file#precompilation

@rohfosho
Copy link
Author

I tested two different operations, with one succeeding and one resulting in the same NIF panic:

First test was the example used in #1011

Mix.install([{:explorer, "~> 0.10.0"}])

name_dtype = {"names",
{:list,
 {:struct,
  [
    {"language", :string},
    {"name", :string},
    {"transliteration", :category},
    {"type", :category}
  ]}}}

[
  %{names: []},
  %{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
]
|> Explorer.DataFrame.new(dtypes: [name_dtype])
|> dbg

Which resulted in a NIF panic:

[iex:6: (file)]
[
  %{names: []},
  %{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
] #=> [
  %{names: []},
  %{
    names: [
      %{name: "CABK", type: "acronym", language: nil, transliteration: "none"}
    ]
  }
]
|> Explorer.DataFrame.new(dtypes: [name_dtype]) #=> #Inspect.Error<
  got ErlangError with message:

      """
      Erlang error: :nif_panicked
      """

  while inspecting:

      %{
        data: %Explorer.PolarsBackend.DataFrame{
          resource: #Reference<0.3723385053.1587150849.9786>
        },
        remote: nil,
        names: ["names"],
        __struct__: Explorer.DataFrame,
        groups: [],
        dtypes: %{
          "names" => {:list,
           {:struct,
            [
              {"language", :string},
              {"name", :string},
              {"transliteration", :category},
              {"type", :category}
            ]}}
        }
      }

  Stacktrace:

    (explorer 0.10.1) Explorer.PolarsBackend.Native.s_to_list(#Explorer.PolarsBackend.Series<
  #Reference<0.3723385053.1586364425.234579>
>)
    (explorer 0.10.1) lib/explorer/polars_backend/shared.ex:24: Explorer.PolarsBackend.Shared.apply_series/3
    (explorer 0.10.1) lib/explorer/backend/data_frame.ex:324: anonymous fn/3 in Explorer.Backend.DataFrame.build_cols_algebra/3
    (elixir 1.18.1) lib/enum.ex:1714: Enum."-map/2-lists^map/1-1-"/2
    (explorer 0.10.1) lib/explorer/backend/data_frame.ex:283: Explorer.Backend.DataFrame.inspect/5
    (explorer 0.10.1) lib/explorer/data_frame.ex:6379: Inspect.Explorer.DataFrame.inspect/2
    (elixir 1.18.1) lib/inspect/algebra.ex:348: Inspect.Algebra.to_doc/2
    (elixir 1.18.1) lib/kernel.ex:2376: Kernel.inspect/2

>

The second operation I tried was creating a simple dataframe and that succeeded:

df = Explorer.DataFrame.new(%{
  "id" => ["a", "b", "c"],
  "type" => ["x", "y", "z"]
})

Output:

#Explorer.DataFrame<
  Polars[3 x 2]
  id string ["a", "b", "c"]
  type string ["x", "y", "z"]
>

I'm deploying using Mix releases and Docker with the base image being elixir-1.18.1

I also verified that during the build processing I'm correctly downloading the precompiled NIF:

[debug] Downloading NIF from https://github.com/elixir-nx/explorer/releases/download/v0.10.1/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz

@billylanchantin
Copy link
Member

@rohfosho Thank you for the additional info. Can you possibly share a dataframe which exhibits the panic you originally saw? #1011 is still an open issue, so it panicking is expected.

@rohfosho
Copy link
Author

@billylanchantin No problem! My use case is that I'm currently trying to read a parquet file straight from s3 that has 4 columns

%{
  "id" => :string,
  "point" => :string,
  "rarity" => :string,
  "type" => :string
}

I see that Explorer is able to pull it down and see the different columns but doesn't make it past that. I can send over a sample file if that's helpful!

@billylanchantin
Copy link
Member

Yeah that'd be great, thanks!

@rohfosho
Copy link
Author

@billylanchantin github won't let me upload parquet files here, cool if I DM you on the Elixir slack?

@rohfosho
Copy link
Author

Just sent it via Slack. Let me know if you prefer something else and I can upload to google drive!

@billylanchantin
Copy link
Member

billylanchantin commented Jan 14, 2025

Ok I got the file and some more info off slack.

  • from_parquet: works on their local machine but not in prod
  • load_parquet: works on their local machine and in prod

So they're technically unblocked right now since they can use load_parquet instead. But the bug is still there.

As a sanity check, I ran our setup-localstack.sh and uploaded the file to a local amazon-ec2-metadata-mock container (like we do with our wine dataset). I ran a modified version of our S3 test:

@tag :cloud_integration
test "reads rohfosho's parquet file from S3" do
  config = %FSS.S3.Config{
    access_key_id: "test",
    secret_access_key: "test",
    endpoint: "http://localhost:4566",
    region: "us-east-1"
  }

  assert {:ok, df} =
            DF.from_parquet("s3://test-bucket/rohfosho.parquet",
              config: config,
            )

  df |> DF.print()
end

which passed. Must be something more specific, IDK yet. @josevalim any ideas?

@philss
Copy link

philss commented Jan 14, 2025

Hey, I suspect that this may be some library missing inside the container. Can you print the result of the following command?

ldd -v /path/to/the/extracted/lib.so

Where this path is printed right after you install Explorer - there is a small bug though: the path should not end with tar.gz, so just omit it and it will work fine.

@rohfosho
Copy link
Author

@philss sure! here you go

ldd -v _build/prod/rel/oracle/lib/explorer-0.10.1/priv/native/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so
        linux-vdso.so.1 (0x00007ffd65942000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007baebffe2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007baebffdd000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007baebfefe000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007baebfef9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007baebfd18000)
        /lib64/ld-linux-x86-64.so.2 (0x00007baec3f42000)

        Version information:
        _build/prod/rel/oracle/lib/explorer-0.10.1/priv/native/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so:
                libgcc_s.so.1 (GCC_3.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
                libgcc_s.so.1 (GCC_3.3) => /lib/x86_64-linux-gnu/libgcc_s.so.1
                libgcc_s.so.1 (GCC_4.2.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
                libpthread.so.0 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libpthread.so.0
                libpthread.so.0 (GLIBC_2.12) => /lib/x86_64-linux-gnu/libpthread.so.0
                libm.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libm.so.6
                libm.so.6 (GLIBC_2.27) => /lib/x86_64-linux-gnu/libm.so.6
                libm.so.6 (GLIBC_2.29) => /lib/x86_64-linux-gnu/libm.so.6
                libdl.so.2 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libdl.so.2
                libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.3.2) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.6) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.7) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.9) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.17) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.18) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.25) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.28) => /lib/x86_64-linux-gnu/libc.so.6
                ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        /lib/x86_64-linux-gnu/libgcc_s.so.1:
                libc.so.6 (GLIBC_2.35) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.34) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        /lib/x86_64-linux-gnu/libpthread.so.0:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        /lib/x86_64-linux-gnu/libm.so.6:
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_PRIVATE) => /lib/x86_64-linux-gnu/libc.so.6
        /lib/x86_64-linux-gnu/libdl.so.2:
                libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
        /lib/x86_64-linux-gnu/libc.so.6:
                ld-linux-x86-64.so.2 (GLIBC_2.35) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
                ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2

@philss
Copy link

philss commented Jan 17, 2025

@rohfosho thank you for the info! I think that's nothing wrong there, based on what you sent. Would you mind to share the Dockerfile, or the full base image tag that you are using? It may be easier to reproduce.

@rohfosho
Copy link
Author

@philss for sure, here you go:

# syntax = docker/dockerfile:1.2

# Use the official Elixir image as the base image
FROM elixir:1.18.1 AS builder

# Set the working directory inside the container
WORKDIR /app

# Install required system dependencies
RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    build-essential \
    git \
    nodejs \
    npm \
    postgresql-client \
    python3

RUN mix local.hex --force && \
    mix local.rebar --force

# Copy the mix files first for better docker build caching
COPY mix.exs ./
COPY mix.lock ./

RUN mix deps.get

# Compile the dependencies, set MIX_ENV beforehand
ARG MIX_ENV=prod
ENV MIX_ENV=${MIX_ENV}
RUN mix deps.compile

# Now copy the whole project to avoid rebuilding the deps when the source code changes
COPY . . 

# Set permissions for the release script
RUN chmod +x release.sh

# Source Code Compilation Stage
FROM builder AS compiler

# Set the working directory
WORKDIR /app

# Execute release script
RUN --mount=type=secret,id=_env,dst=/etc/secrets/.env ./release.sh

# New stage for the runtime image to reduce the final size
FROM elixir:1.18.1

# Set the working directory
WORKDIR /app

# Install minimal dependencies
RUN apt-get update -y && \
    apt-get install -y --no-install-recommends \
    nodejs \
    npm \
    postgresql-client \
    python3

COPY --from=compiler /app/ .

# Expose the port the application will run on
EXPOSE 4000

# Define the entrypoint for the application
ENTRYPOINT ["/app/_build/${MIX_ENV}/rel/my_app/bin/my_app"]

# Start the Phoenix application
CMD ["start"]

@philss
Copy link

philss commented Jan 31, 2025

@rohfosho sorry for the delay. I built a container image and ran the code, but I couldn't reproduce the problem. I'm running in a Linux environment (Fedora 41 - w/ Podman). If you don't mind, can you run your code with the EXPLORER_USE_LEGACY_ARTIFACTS env var configured to "true"? This might be something related to legacy CPUs.

Another shot would be to try to compile from source, from our main branch, and see if the problem persists. We updated Polars recently, so it may be working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants