Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update a Github dependency with --precise <short_id> fails #13188

Closed
JEnoch opened this issue Dec 21, 2023 · 12 comments · Fixed by #13250
Closed

Update a Github dependency with --precise <short_id> fails #13188

JEnoch opened this issue Dec 21, 2023 · 12 comments · Fixed by #13250
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-update S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review

Comments

@JEnoch
Copy link

JEnoch commented Dec 21, 2023

Problem

Updating a Github dependency with a short commit id fails with such log:

error: Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
  object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)

Steps

cargo new test-cargo-precise 
cd test-cargo-precise 
cargo add --git https://github.com/rust-lang/regex regex 
CARGO_LOG=trace cargo update -p regex --precise 837fd85

Resulting cargo update log:

   0.000038417s TRACE cargo: start="2023-12-21T07:57:53.908041000Z"
   0.000538542s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_ALIAS_UPDATE", parts: [("alias", 5), ("update", 11)] }
   0.000570375s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_TERM", parts: [("term", 5)] }
   0.000589250s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_NET", parts: [("net", 5)] }
   0.001122417s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_HTTP", parts: [("http", 5)] }
   0.001523083s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_BUILD", parts: [("build", 5)] }
   0.001558208s TRACE cargo::util::toml: read_manifest; path=/Users/julienenoch/tmp/test-cargo-precise/Cargo.toml; source-id=/Users/julienenoch/tmp/test-cargo-precise
   0.001857042s DEBUG cargo::core::workspace: find_members - only me as a member
   0.002212917s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_PATCH", parts: [("patch", 5)] }
   0.002268583s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRY_INDEX", parts: [("registry", 5), ("index", 14)] }
   0.002286292s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRIES_CRATES_IO_PROTOCOL", parts: [("registries", 5), ("crates-io", 16), ("protocol", 26)] }
   0.002298250s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRIES_CRATES_IO_PROTOCOL", parts: [("registries", 5), ("crates-io", 16), ("protocol", 26)] }
   0.002305875s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_REGISTRY_INDEX", parts: [("registry", 5), ("index", 14)] }
   0.002316875s TRACE cargo::util::config: get cv ConfigKey { env: "CARGO_SOURCE", parts: [("source", 5)] }
   0.002328917s DEBUG cargo::core::registry: load/missing  https://github.com/rust-lang/regex#837fd85
   0.002347417s DEBUG cargo::core::registry: loading source https://github.com/rust-lang/regex#837fd85
   0.002353417s DEBUG cargo::sources::config: loading: https://github.com/rust-lang/regex#837fd85
   0.002360000s TRACE cargo::core::source_id: loading SourceId; https://github.com/rust-lang/regex#837fd85
    Updating git repository `https://github.com/rust-lang/regex`
   0.009446625s TRACE cargo::sources::git::source: updating git source `GitRemote { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/rust-lang/regex", query: None, fragment: None } }`
   0.010665208s DEBUG cargo::sources::git::utils: attempting GitHub fast path for https://api.github.com/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000
   0.225184458s DEBUG cargo::sources::git::utils: skipping gc as there's only 2 pack files
   0.225689625s DEBUG cargo::sources::git::utils: doing a fetch for https://github.com/rust-lang/regex
   0.226630292s DEBUG cargo::sources::git::utils: initiating fetch of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"] from https://github.com/rust-lang/regex
   0.716648583s DEBUG cargo::sources::git::utils: attempting GitHub fast path for https://api.github.com/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000
   0.739785417s DEBUG cargo::sources::git::utils: skipping gc as there's only 0 pack files
   0.739872500s DEBUG cargo::sources::git::utils: doing a fetch for https://github.com/rust-lang/regex
   0.740022625s DEBUG cargo::sources::git::utils: initiating fetch of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"] from https://github.com/rust-lang/regex
   2.332392833s DEBUG cargo: exit_with_error; err=CliError { error: Some(Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
    object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)), exit_code: 101 }
   2.332410958s DEBUG cargo: display_error; err=Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
    object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)
error: Unable to update https://github.com/rust-lang/regex#837fd85

Caused by:
  object not found - no match for id (837fd85000000000000000000000000000000000); class=Odb (9); code=NotFound (-3)

Possible Solution(s)

No response

Notes

This Github API URL used by cargo returns an error:
https://api.github.com/repos/rust-lang/regex/commits/837fd85000000000000000000000000000000000

While the same but with short commit id succeeds:
https://api.github.com/repos/rust-lang/regex/commits/837fd85

What's strange is that I successfully used such cargo update -p <dep> --precise <short_id> few weeks ago.
Could it be that Github changed its API ?

Version

cargo 1.74.1 (ecb9851af 2023-10-18)
release: 1.74.1
commit-hash: ecb9851afd3095e988daaa35a48bc7f3cb748e04
commit-date: 2023-10-18
host: aarch64-apple-darwin
libgit2: 1.7.1 (sys:0.18.0 vendored)
libcurl: 8.1.2 (sys:0.4.68+curl-8.4.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1u  30 May 2023
os: Mac OS 14.1.2 [64-bit]

--
Also same issue with:

cargo 1.71.0 (cfd3bbd8f 2023-06-08)
release: 1.71.0
commit-hash: cfd3bbd8fe4fd92074dfad04b7eb9a923646839f
commit-date: 2023-06-08
host: aarch64-apple-darwin
libgit2: 1.6.4 (sys:0.17.1 vendored)
libcurl: 8.1.2 (sys:0.4.61+curl-8.0.1 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1t  7 Feb 2023
os: Mac OS 14.1.2 [64-bit]
@JEnoch JEnoch added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Dec 21, 2023
@weihanglo
Copy link
Member

Got an impression that @Eh2406 just talked about this somewhere. @Eh2406 do you remember anything?

@JEnoch
Copy link
Author

JEnoch commented Dec 22, 2023

Digging into the code, I see this completion with 0s of short commit id comes from usage of git2::Oid::from_str(s) in SourceId::precise_git_oid().
It somehow ends up to be used in github_fast_path(), probably after conversion from git2::Oid to GitReference::Rev(String). Possibly this one ?

Could the regression come from #12849 ?
But that wouldn't explain why the issue also occurs with cargo 1.71.0 that was release before this PR.

@weihanglo
Copy link
Member

Good finding! Thank you.

I've tested several toolchains. Starting from 1.46.0 Cargo contains this bug, which contains the PR #8363 that you observed the potential bad commit. The Cargo started using Oid::from_str to parse precise field since then:

Some(s) => Some(git2::Oid::from_str(s).with_context(|| {
format!("precise value for git is not a git revision: {}", s)
})?),

The zero-padding behavior is a documented behavior in libgit2 like from the beginning, so not a dependency issue.

It might or might not be GitHub has changed the API. I cannot tell, as Cargo had no such a test exercising --precise <SHA> until recently. And that doesn't even test against short ID.

Feel like SourceId::from_url has an assumption that precise for Git source is always a valid long-enough Git SHA (the format is opaque to cargo users btw). One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not. This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

@weihanglo weihanglo added A-git Area: anything dealing with git Command-update S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review and removed S-triage Status: This issue is waiting on initial triage. labels Dec 22, 2023
@linyihai
Copy link
Contributor

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

I will provide a PR to fix it based on this

This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

This is worth a look, I will check the relevant code by the way, once there is any feasible plan I will make a corresponding plan

@rustbot claim

@JEnoch
Copy link
Author

JEnoch commented Dec 27, 2023

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

If possible I would like to still have the possibility to use --precise with short SHA.

As a use case in Zenoh: there are some plugins (shared libraries) for which the dependency to the zenoh crate must be the exact same commit than the hosting binary (to avoid ABI incompatibility). The short commit id can be retrieved via --version that return the result of git describe.
Using this short commit id in --precise would avoid the user to search for the long commit id.

I think a solution could be the following:

I tested such a patch here, and it seems to work.
What do you think?

@weihanglo
Copy link
Member

Thanks for looking into the issue! The effort is much appreciated :)

However, there are probably two things we need to take care of:

  • The patch seems for GitHub only. I would like to see a general fix, as this issue is not GitHub-specific. GitLab or even a local git repository has the bug.
  • Changing locked_rev is not desired, as the new rev might not be the same as the old one, which we lose the meaning of "locked". Git short hashes can also be generated deliberately, and it doesn't seem too safe to trim and query over network.

That said, it doesn't mean we cannot support short hashes. We need to find a reliable way to pass down the value from --precise, and get it resolved to a full SHA. If the solution is not yet ready, for now we can give a nice message of "short hash not supported".

@JEnoch
Copy link
Author

JEnoch commented Dec 28, 2023

Understood, that makes sense.

I think the root of the issue is that the conversion from short SHA to git2::Oid leads to a 20 bytes SHA completed with 0s that cannot be found in the git2::Repository via call to revparse_single().

I made some tests with git2:
revparse_single() accepts short SHA and returns the full SHA. It detects short SHA collisions, returning such this error message: "ambiguous OID prefix - found multiple offsets for pack entry". So I think it's safe to use it to resolve a short SHA into a full SHA before storing it as locked_rev.
Actually, GitDatabase::resolve(&self, r: &GitReference) uses it and will return a git2::Oid containing the full SHA.

This call to GitDatabase::resolve() for full SHA retrieval could be done in GitSource::new() as a replacement to source_id.precise_git_oid() call. But this requires to create the GitDatabase here, while it's only created in GitSource::block_until_ready() so far. Not sure if it's a problem.

An alternative is to retrieve the full SHA in GitSource::block_until_ready(), just after GitDatabase creation. No need to replace self.locked_rev, but just to use the retrieved full SHA instead in the following match (self.locked_rev, db)

@weihanglo
Copy link
Member

I think the root of the issue is that the conversion from short SHA to git2::Oid leads to a 20 bytes SHA completed with 0s that cannot be found in the git2::Repository via call to revparse_single().

Yeah. Totally agree on the root cause.

There seems to be an “implicit assumption” that a git SourceId has a precise field of a full SHA. cargo update is an exception that makes use of --precise to have a transient precise value presenting a arbitrary git revision. It would eventually become a full SHA when the revision is resolved.

The proposed solutions aren't wrong, though I would like to see locked_rev to be a type that represent a "revision", so we could have either full SHA (oid), or an arbitrary git revision. That would also make the behavior aligned to the doc:

--precise precise
When used with spec, allows you to specify a specific version number to set the package to. If the package comes from a git repository, this can be a git revision (such as a SHA hash or tag).

BTW, I accidentally have a working patch of this during code exploration. I may post a PR later but feel free to propose more better solutions :)

@linyihai
Copy link
Contributor

linyihai commented Jan 5, 2024

One fix I can think of is comparing if the value of --precise is a hex string of 40/64 characters here, and give a nice error if not.

I will provide a PR to fix it based on this

This is only a fix on the surface, if we find more commands have this kind of bug, we might want to push the fix down to lower-level.

This is worth a look, I will check the relevant code by the way, once there is any feasible plan I will make a corresponding plan

@rustbot claim

Sorry, I haven't started this work yet. It's nice to see a better solution being proposed. 👍

@linyihai linyihai removed their assignment Jan 5, 2024
@weihanglo
Copy link
Member

Oops. I didn't realize this is claimed. Sorry for that, @linyihai 😔

Actually I was trying to provide guidance but ended up writing a fix

@linyihai
Copy link
Contributor

linyihai commented Jan 5, 2024

Oops. I didn't realize this is claimed. Sorry for that, @linyihai 😔

Actually I was trying to provide guidance but ended up writing a fix

The content to be changed is already very different from what I have been exposed to before, So feel free to continue it. Your PR is also a good guidance for me.

@JEnoch
Copy link
Author

JEnoch commented Jan 5, 2024

BTW, I accidentally have a working patch of this during code exploration. I may post a PR later but feel free to propose more better solutions :)

You PR is more advanced than my proposal. I agree that having a Revision enum that can be a GitReference to be resolved when required is more accurately reflecting reality. Nice job! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git C-bug Category: bug Command-update S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants