Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: podman-compose show etcd Unhealthy #40372

Open
1 task done
liangbug opened this issue Mar 5, 2025 · 4 comments
Open
1 task done

[Bug]: podman-compose show etcd Unhealthy #40372

liangbug opened this issue Mar 5, 2025 · 4 comments
Assignees
Labels
component/etcd kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@liangbug
Copy link

liangbug commented Mar 5, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.5.5
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): RHEL9
- CPU/Memory: 32 cores/128GB
- GPU: 
- Others:

Current Behavior

Similar issue like #39417, but podman ps shows etcd is unhealthy.
However exec etcdctl endpoint health show healthy.
Image

Below are my podman installation

sudo dnf install podman
sudo dnf install podman-docker
sudo dnf install python3.12 python3.12-pip.noarch 
pip3.12 install podman-compose

Below are my podman compose file which is the same as
https://github.com/milvus-io/milvus/releases/download/v2.5.5/milvus-standalone-docker-compose.yml:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.18
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.5.5
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

networks:
  default:
    name: milvus

Expected Behavior

No response

Steps To Reproduce

Milvus Log

No response

Anything else?

No response

@liangbug liangbug added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 5, 2025
Copy link

dosubot bot commented Mar 5, 2025

Hey @liangbug! I'm here to help you with any bugs and questions you have. Let's work together to solve this issue.

The issue you're experiencing with the milvus-etcd container being marked as "unhealthy" in podman ps while etcdctl endpoint health reports it as "healthy" is likely due to the advertise-client-url configuration in your podman-compose file.

To resolve this discrepancy, you should modify the advertise-client-url in the etcd service command from:

command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

to:

command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

This change ensures that the etcd service is correctly advertised to other components in the Milvus deployment, which should resolve the health status discrepancy [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@liangbug
Copy link
Author

liangbug commented Mar 6, 2025

Hi @dosu
I modify advertise-client-url to http://etcd:2379, then etcd container is still unhealthy.

@yanliang567
Copy link
Contributor

@liangbug could you please share the etcd logs?
/assign @liangbug
/unassign

@sre-ci-robot sre-ci-robot assigned liangbug and unassigned yanliang567 Mar 6, 2025
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 6, 2025
Copy link

dosubot bot commented Mar 6, 2025

Ensure that the health check command in your podman-compose file is targeting the correct endpoint. Modify the health check command to explicitly specify the endpoint as follows:

healthcheck:
  test: ["CMD", "etcdctl", "--endpoints=http://localhost:2379", "endpoint", "health"]
  interval: 30s
  timeout: 20s
  retries: 3

This change ensures that the health check command is using the correct endpoint within the container [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/etcd kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

2 participants