-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DVC get/import failed when Git behind a proxy #10563
Comments
@luhuiguo could you please run it with is it the same error if you run it w/o setting up the proxy in Git? |
Unset the Git proxy$ git config unset --global http.proxy
$ git config unset --global https.proxy
$ git config list
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true Git clone failed$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
fatal: unable to access '<GIT_REPOSITORY_URL>': Could not resolve host: <GITLAB_HOST> DVC get failed$ dvc get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:45:53,687 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-23 10:45:53,687 DEBUG: command: get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:45:53,946 DEBUG: Creating external repo <GIT_REPOSITORY_URL>@None
2024-09-23 10:45:53,946 DEBUG: erepo: git clone '<GIT_REPOSITORY_URL>' to a temporary dir
2024-09-23 10:46:18,913 ERROR: failed to get '<PATH>' - SCM error: Failed to clone repo '<GIT_REPOSITORY_URL>' to '/tmp/tmpibarss8odvc-clone': HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known
Traceback (most recent call last):
File "urllib3/connection.py", line 196, in _new_conn
File "urllib3/util/connection.py", line 60, in create_connection
File "socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 615, in connect
File "urllib3/connection.py", line 203, in _new_conn
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 136, in request
File "urllib3/_request_methods.py", line 183, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scmrepo/git/backend/dulwich/__init__.py", line 260, in clone
File "dulwich/porcelain.py", line 546, in clone
File "dulwich/client.py", line 752, in clone
File "dulwich/client.py", line 840, in fetch
File "dulwich/client.py", line 2157, in fetch_pack
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f0967118070>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/scm.py", line 150, in clone
File "scmrepo/git/__init__.py", line 154, in clone
File "scmrepo/git/backend/dulwich/__init__.py", line 268, in clone
scmrepo.exceptions.CloneError: Failed to clone repo '<GIT_REPOSITORY_URL>' to '/tmp/tmpibarss8odvc-clone'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 155, in clone
dvc.scm.CloneError: SCM error
2024-09-23 10:46:18,950 DEBUG: Analytics is enabled.
2024-09-23 10:46:18,952 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpggwvzttp', '-v']
2024-09-23 10:46:18,962 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpggwvzttp', '-v'] with pid 201 Configure Git to use a proxy$ git config --global http.proxy http://10.3.12.8:3128
$ git config --global https.proxy http://10.3.12.8:3128
$ git config list
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128 GIT clone successful$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 166 (delta 42), reused 0 (delta 0), pack-reused 33
Receiving objects: 100% (166/166), 11.08 MiB | 874.00 KiB/s, done.
Resolving deltas: 100% (48/48), done. DVC get failed$ dvc get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:40:27,336 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-23 10:40:27,336 DEBUG: command: get -v <GIT_REPOSITORY_URL> <PATH>
2024-09-23 10:40:27,486 DEBUG: Creating external repo <GIT_REPOSITORY_URL>@None
2024-09-23 10:40:27,486 DEBUG: erepo: git clone '<GIT_REPOSITORY_URL>' to a temporary dir
2024-09-23 10:41:06,026 ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known
Traceback (most recent call last):
File "urllib3/connection.py", line 196, in _new_conn
File "urllib3/util/connection.py", line 60, in create_connection
File "socket.py", line 955, in getaddrinfo
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 615, in connect
File "urllib3/connection.py", line 203, in _new_conn
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 136, in request
File "urllib3/_request_methods.py", line 183, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOST>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7ff140e4c040>: Failed to resolve '<GITLAB_HOST>' ([Errno -2] Name or service not known)"))
2024-09-23 10:41:06,128 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.6.1),
hdfs (fsspec = 2024.6.1, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.6.1, boto3 = 1.35.7),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.6.1)
Config:
Global: /home/luhg/.config/dvc
System: /etc/xdg/dvc
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-23 10:41:06,145 DEBUG: Analytics is enabled.
2024-09-23 10:41:06,147 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp8y57xigq', '-v']
2024-09-23 10:41:06,155 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp8y57xigq', '-v'] with pid 174 |
Okay, probably it should be fixed on the https://github.com/jelmer/dulwich side. Is there a way for you to run with HTTP_PROXY and HTTPS_PROXY env vars set? I think dulwich supports those. Probably you can create an alias for now for |
I've created an issue upstream jelmer/dulwich#1368 |
Okay, seems it (the proxy via global Git config) should be supported. I've tried to do this:
after running:
So, it's trying to connect to proxy (and fails). We need a simpler way to reproduce this to research - e.g. some way to run a local proxy to do some experiments. |
Is it a problem related to domain name resolution?my error message:
and your message:
We deploy a self-managed GitLab instance in the company intranet and use the company's intranet domain name resolution. The gitlab hostname is not resolvable outside our company intranet. on my PC ping <GITLAB_HOSTNAME>
ping: <GITLAB_HOSTNAME>: Name or service not known On proxy server: ping <GITLAB_HOSTNAME>
PING <GITLAB_HOSTNAME> (192.168.57.131) 56(84) bytes of data.
64 bytes from<GITLAB_HOSTNAME> (192.168.57.131): icmp_seq=1 ttl=59 time=3.08 ms Change the domain name in the git url to the IP address
It seems that I can connect to the gitlab server, but the IP address of the certificate does not match. |
Yes, it seems so, but it's hard to tell why is it trying to resolve it on the machine outside proxy. You can try to add hostname to the Otherwise we need a simple setup (some local) proxy to reproduce this. |
I still think there are some special cases where the proxy doesn't workI run DVC with Docker.$ docker run -it --rm -v ${PWD}:/workspace <DVC_IMAGE> bash Dockerfile FROM ubuntu:24.04
RUN apt update && apt install -y gpg curl wget software-properties-common iputils-ping
RUN add-apt-repository -y ppa:git-core/ppa && apt update && apt install -y git
RUN git config --global user.email "<USER_EMAIL>" && git config --global user.name "<USER_NAME>"
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN apt update && apt install -y git-lfs && git lfs install
RUN wget https://dvc.org/deb/dvc.list -O /etc/apt/sources.list.d/dvc.list && \
wget -qO - https://dvc.org/deb/iterative.asc | gpg --dearmor > packages.iterative.gpg && \
install -o root -g root -m 644 packages.iterative.gpg /etc/apt/trusted.gpg.d/ && \
rm -f packages.iterative.gpg
RUN apt update && apt install -y dvc
RUN mkdir -p /workspace
WORKDIR /workspace Everything works fine on other computers, At first,Gitlab hostname is not resolvable and the gitlab host is unreachable$ ping <GITLAB_HOSTNAME>
ping: <GITLAB_HOSTNAME>: Name or service not known
$ ping 192.168.57.131
PING 192.168.57.131 (192.168.57.131) 56(84) bytes of data.
--- 192.168.57.131 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
$ curl <GITLAB_REPOSITORY_URL>
curl: (6) Could not resolve host: <GITLAB_HOSTNAME>
$ git clone <GIT_REPOSITORY_URL>
Cloning into '<DIRECTORY>'...
fatal: unable to access '<GIT_REPOSITORY_URL>': Could not resolve host: <GITLAB_HOSTNAME> Configure Git to use a proxy$ git config --global http.proxy http://10.3.12.8:3128
$ git config --global https.proxy http://10.3.12.8:3128 Git can clone the repositorygit clone <GITLAB_REPOSITORY_URL
Cloning into '<REPOSITORY>'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (133/133), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 166 (delta 42), reused 0 (delta 0), pack-reused 33
Receiving objects: 100% (166/166), 11.08 MiB | 871.00 KiB/s, done.
Resolving deltas: 100% (48/48), done. Configure proxy environment variable$ curl <GITLAB_REPOSITORY_URL>
curl: (6) Could not resolve host: <GITLAB_HOSTNAME>
$ export HTTP_PROXY=http://10.3.12.8:3128
$ export HTTPS_PROXY=http://10.3.12.8:3128 CURL can access <GITLAB_REPOSITORY_URL>$ curl <GITLAB_REPOSITORY_URL>
<html><body>You are being <a href="https://<GITLAB_HOSTNAME>/users/sign_in">redirected</a>.</body></html> But can not get the file tracked by DVC$ dvc get <GITLAB_REPOSITORY_URL> <PATH>
ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /JYAI/data-registry/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known)")): HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known)")): <urllib3.connection.HTTPSConnection object at 0x7f85816334c0>: Failed to resolve '<GITLAB_HOSTNAME>' ([Errno -2] Name or service not known): [Errno -2] Name or service not known Add hostname to the /etc/hostname$ echo "192.168.57.131 <GITLAB_HOSTNAME>">> /etc/hosts
$ ping <GITLAB_HOSTNAME>
PING <GITLAB_HOSTNAME> (192.168.57.131) 56(84) bytes of data. Still not workingdvc get -v <GITLAB_REPOSITORY_URL> <PATH>
2024-09-25 12:52:05,038 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
2024-09-25 12:52:05,038 DEBUG: command: get -v <GITLAB_REPOSITORY_URL> <PATH>
2024-09-25 12:52:05,187 DEBUG: Creating external repo <GITLAB_REPOSITORY_URL>@None
2024-09-25 12:52:05,187 DEBUG: erepo: git clone '<GITLAB_REPOSITORY_URL>' to a temporary dir
Cloning data-registry.git|█████████████████████████████████████████████████████████████████████████████████████████| Compressing |119/119 [00:00, 3.01obj/s]2024-09-25 13:00:47,786 ERROR: unexpected error - HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')): HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')): (<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'): [Errno 110] Connection timed out
Traceback (most recent call last):
File "urllib3/connection.py", line 199, in _new_conn
File "urllib3/util/connection.py", line 85, in create_connection
File "urllib3/util/connection.py", line 73, in create_connection
TimeoutError: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 693, in connect
File "urllib3/connection.py", line 208, in _new_conn
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 135, in request
File "urllib3/_request_methods.py", line 182, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='<GITLAB_HOSTNAME>', port=443): Max retries exceeded with url: /<REPOSITORY>/info/refs?service=git-upload-pack (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fc791d8f5b0>, 'Connection to <GITLAB_HOSTNAME> timed out. (connect timeout=None)'))
2024-09-25 13:00:47,888 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.23),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc |
Yes, right. Or it still pick it up, but for whatever reason is trying to resolve the hostname while it should be doing that on the proxy machine (?).
🤔 I really need a way to reproduce this locally. Then I'm pretty sure I can find the reason faster. If you have some idea how to run a proxy on my machine to experiment with it - that would help a lot. |
ReproduceStart a proxy server$ docker run -d --name squid-container -e TZ=UTC -p 3128:3128 ubuntu/squid Start a DVC container$ docker run -it --rm luhuiguo/dvc bash DVC get successfullyroot@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:25:27,589 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:25:27,589 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:25:27,697 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:25:27,697 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:25:49,445 DEBUG: Analytics is enabled.
2024-09-26 12:25:49,446 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpl2zikt7c', '-v']
2024-09-26 12:25:49,452 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpl2zikt7c', '-v'] with pid 234
2024-09-26 12:25:49,454 DEBUG: Removing '/tmp/tmpb241nh80dvc-clone'
2024-09-26 12:25:49,457 DEBUG: Removing '/tmp/tmpu6ugkwyrdvc-cache'
root@bd4ec17f398c:/workspace# rm -rf data Block github.com hostnameroot@bd4ec17f398c:/workspace# echo "127.0.0.1 github.com">> /etc/hosts
root@bd4ec17f398c:/workspace# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 bd4ec17f398c
127.0.0.1 github.com GIT clone and DVC get failedroot@bd4ec17f398c:/workspace# git clone -v https://github.com/iterative/dataset-registry
Cloning into 'dataset-registry'...
fatal: unable to access 'https://github.com/iterative/dataset-registry/': Failed to connect to github.com port 443 after 0 ms: Couldn't connect to server
root@bd4ec17f398c:/workspace# dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
ERROR: failed to get 'get-started/data.xml' - SCM error: Failed to clone repo 'https://github.com/iterative/dataset-registry' to '/tmp/tmpiagkrm4pdvc-clone': HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused')): HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused')): <urllib3.connection.HTTPSConnection object at 0x7f23ea1fc400>: Failed to establish a new connection: [Errno 111] Connection refused: [Errno 111] Connection refused Configure Git to use a proxyroot@bd4ec17f398c:/workspace# git config --global http.proxy http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# git config --global https.proxy http://10.3.12.8:3128
root@bd4ec17f398c:/workspace# git config list
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
[email protected]
user.name=luhuiguo
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128 GIT clone successfullyroot@bd4ec17f398c:/workspace# git clone -v https://github.com/iterative/dataset-registry
Cloning into 'dataset-registry'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 1202 to 636 bytes)
remote: Enumerating objects: 328, done.
remote: Counting objects: 100% (123/123), done.
remote: Compressing objects: 100% (84/84), done.
remote: Total 328 (delta 53), reused 61 (delta 38), pack-reused 205 (from 1)
Receiving objects: 100% (328/328), 50.37 KiB | 606.00 KiB/s, done.
Resolving deltas: 100% (85/85), done. DVC get failedroot@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:39:41,732 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:39:41,732 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:39:41,837 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:39:41,837 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:39:43,436 ERROR: unexpected error - HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused')): HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused')): <urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused: [Errno 111] Connection refused
Traceback (most recent call last):
File "urllib3/connection.py", line 199, in _new_conn
File "urllib3/util/connection.py", line 85, in create_connection
File "urllib3/util/connection.py", line 73, in create_connection
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 789, in urlopen
File "urllib3/connectionpool.py", line 490, in _make_request
File "urllib3/connectionpool.py", line 466, in _make_request
File "urllib3/connectionpool.py", line 1095, in _validate_conn
File "urllib3/connection.py", line 693, in connect
File "urllib3/connection.py", line 214, in _new_conn
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dulwich/client.py", line 2290, in _http_request
File "urllib3/_request_methods.py", line 135, in request
File "urllib3/_request_methods.py", line 182, in request_encode_url
File "urllib3/poolmanager.py", line 443, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 873, in urlopen
File "urllib3/connectionpool.py", line 843, in urlopen
File "urllib3/util/retry.py", line 519, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "dvc/cli/__init__.py", line 211, in main
File "dvc/cli/command.py", line 41, in do_run
File "dvc/commands/get.py", line 30, in run
File "dvc/commands/get.py", line 37, in _get_file_from_repo
File "dvc/repo/get.py", line 45, in get
File "dvc/repo/__init__.py", line 302, in open
File "dvc/repo/open_repo.py", line 60, in open_repo
File "contextlib.py", line 79, in inner
File "dvc/repo/open_repo.py", line 23, in _external_repo
File "dvc/repo/open_repo.py", line 134, in _cached_clone
File "funcy/decorators.py", line 47, in wrapper
File "funcy/flow.py", line 246, in wrap_with
File "funcy/decorators.py", line 68, in __call__
File "dvc/repo/open_repo.py", line 198, in _clone_default_branch
File "dvc/scm.py", line 152, in clone
File "dvc/repo/experiments/utils.py", line 275, in fetch_all_exps
File "dvc/repo/experiments/utils.py", line 275, in <listcomp>
File "dvc/repo/experiments/utils.py", line 119, in iter_remote_refs
File "scmrepo/git/backend/dulwich/__init__.py", line 590, in iter_remote_refs
File "dulwich/client.py", line 2208, in get_refs
File "dulwich/client.py", line 2013, in _discover_references
File "scmrepo/git/backend/dulwich/client.py", line 50, in _http_request
File "dulwich/client.py", line 2298, in _http_request
dulwich.errors.GitProtocolError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /iterative/dataset-registry/info/refs?service=git-upload-pack (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f86763ab4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
2024-09-26 12:39:43,464 DEBUG: Version info for developers:
DVC version: 3.55.2 (deb)
-------------------------
Platform: Python 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.18.0),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.23),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-09-26 12:39:43,469 DEBUG: Analytics is enabled.
2024-09-26 12:39:43,470 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpk4n384y7', '-v']
2024-09-26 12:39:43,475 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpk4n384y7', '-v'] with pid 342 Unblock suiyiyu.us.kgroot@bd4ec17f398c:/workspace# echo "$(sed '/github.com/d' /etc/hosts)" > /etc/hosts
root@bd4ec17f398c:/workspace# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.3 bd4ec17f398c
root@bd4ec17f398c:/workspace# git config list
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
filter.lfs.clean=git-lfs clean -- %f
[email protected]
user.name=luhuiguo
filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
http.proxy=http://10.3.12.8:3128
https.proxy=http://10.3.12.8:3128 DVC get successfullyroot@bd4ec17f398c:/workspace# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:47:29,915 DEBUG: v3.55.2 (deb), CPython 3.10.8 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-09-26 12:47:29,915 DEBUG: command: get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-26 12:47:30,025 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-26 12:47:30,025 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-26 12:50:50,305 DEBUG: Analytics is enabled.
2024-09-26 12:50:50,305 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp29_f6y8i', '-v']
2024-09-26 12:50:50,311 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp29_f6y8i', '-v'] with pid 516
2024-09-26 12:50:50,312 DEBUG: Removing '/tmp/tmp_0xv8ud8dvc-clone'
2024-09-26 12:50:50,314 DEBUG: Removing '/tmp/tmp47ujcmg0dvc-cache' |
@luhuiguo could you try to install scmrepo from this branch iterative/scmrepo#378 and do some experiments thanks for the reproducible env! |
It worksInstall scmrepo from branch fix-fetch-exps-under-proxy$ docker run -it --rm python bash
$ root@fcf01db756c8:/# pip install dvc
$ pip install git+https://github.com/iterative/scmrepo.git@fix-fetch-exps-under-proxy
Collecting git+https://github.com/iterative/scmrepo.git@fix-fetch-exps-under-proxy
Cloning https://github.com/iterative/scmrepo.git (to revision fix-fetch-exps-under-proxy) to /tmp/pip-req-build-v2fbye8o
.......
Successfully built scmrepo
Installing collected packages: scmrepo
Attempting uninstall: scmrepo
Found existing installation: scmrepo 3.3.7
Uninstalling scmrepo-3.3.7:
Successfully uninstalled scmrepo-3.3.7
Successfully installed scmrepo-3.3.8.dev4+gf2e18e2 Block github.com and use git proxy configroot@fcf01db756c8:/# echo "127.0.0.1 github.com">> /etc/hosts
root@fcf01db756c8:/# git config --global http.proxy http://10.3.12.8:3128
root@fcf01db756c8:/# git config --global https.proxy http://10.3.12.8:3128
root@fcf01db756c8:/# dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-27 01:48:59,097 DEBUG: v3.55.2 (pip), CPython 3.12.6 on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
2024-09-27 01:48:59,097 DEBUG: command: /usr/local/bin/dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2024-09-27 01:48:59,285 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2024-09-27 01:48:59,285 DEBUG: erepo: git clone 'https://github.com/iterative/dataset-registry' to a temporary dir
2024-09-27 01:49:09,292 DEBUG: Analytics is enabled.
2024-09-27 01:49:09,323 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmp5dtejb1u', '-v']
2024-09-27 01:49:09,328 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmp5dtejb1u', '-v'] with pid 219
2024-09-27 01:49:09,330 DEBUG: Removing '/tmp/tmpbjw1dcxjdvc-clone'
2024-09-27 01:49:09,333 DEBUG: Removing '/tmp/tmp6mr_z8rfdvc-cache' |
Okay, good. I'll try to get to it to add tests and release asap. Thanks for your help reproducing this. |
Bug Report
get/import : Name or service not known
Description
I have a situation where my computer is behind a proxy, and needs to access a Git repository outside of the proxy network. When running dvc get/import behind my proxy, my file is not downloaded and I get the following error: [Errno -2] Name or service not known.
Configure Git to use a proxy
git clone , dvc pull ... Everything is OK
But when I want to download file tracked by DVC into other workspace
Reproduce
dvc get GIT_URL_BEHIND_A_PROXY PATH
Expected
dvc get/import use the git proxy config
Environment information
Output of
dvc doctor
:Additional Information (if any):
The text was updated successfully, but these errors were encountered: