Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved upload/download methods for python client #1205

Merged
merged 24 commits into from
May 8, 2024

Conversation

bw27492
Copy link
Member

@bw27492 bw27492 commented Apr 6, 2024

  • Progress bars for uploads and downloading to disk
  • Method to download all files from a release
  • Ability to filter files based on regex (e.g. "*.json")

@bw27492 bw27492 marked this pull request as draft April 6, 2024 15:30
@bw27492 bw27492 changed the title Improved download methods for python client Improved upload/download methods for python client Apr 6, 2024
Comment on lines 15 to 18
try:
no_color = int(os.environ["NO_COLOR"])
except KeyError:
no_color = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be implemented somewhere else probably the top level initialiser. So we can call on it in other parts of the package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 15 to 18
try:
no_color = int(os.environ["NO_COLOR"])
except KeyError:
no_color = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at these docs and the example implementation. Something along the lines of:

NO_COLOR = "NO_COLOR" in os.environ

The docs suggest that as long as the environment variable exists or it's not a null terminating string which you don't need to worry about in python, we shouldn't worry about the value of the NO_COLOR environment variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 33 to 42
for filename in ["test.json", "test2.txt", "to_exclude.txt"]:
example_release.upload(filename, file)
file.seek(0)

example_release.download_all(path=downloads_path, include=["*.txt"], exclude=["to_exclude.txt"])

assert os.listdir(downloads_path) == ["test2.txt"]

with open(str(downloads_path.join("test2.txt")), "rb") as f:
assert f.read() == byte_obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is a bit specific. I would split this test instead, and I would avoid downloading any files into a directory, as this could some side effect's in pipelines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (split tests). Note - using the tmpdir fixture so writing to files isn't an issue

Comment on lines +210 to +213
os.makedirs(path, exist_ok=True)
for file in file_names:
file_path = os.path.join(path, file)
self.download(filename=file, path=file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want some better error handling here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary - os.makedirs provides permission denied error

with open(path, "wb") as f:
f.write(res.content)
total_size = int(res.headers.get("content-length", 0))
block_size = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a constant at the file, as it seems to be also used implicitly in the upload method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:raises BailoException: If the release has no files assigned to it
..note:: Regex statements must be complete to work as intended (uses re.match not re.search)
"""
files_metadata = self.client.get_release(self.model_id, self.version)["release"]["files"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to cast version to a string in this instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 248 to 252
with tqdm(
total=size, unit="B", unit_scale=True, unit_divisor=1024, postfix=f"uploading {name}", colour=colour
) as t:
wrapped_buffer = CallbackIOWrapper(t.update, data, "read")
res = self.client.simple_upload(self.model_id, name, wrapped_buffer).json()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The progress bar seems to complete even if the request fails. I tested this with uploading two of the same file name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


return res

def download_all(self, path: str = os.getcwd(), include: Union[list, str] = None, exclude: Union[list, str] = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old an new type annotations are being used here instead we should use list | str | None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@GB27247 GB27247 marked this pull request as ready for review April 15, 2024 12:35
@@ -171,12 +227,26 @@ def upload(self, path: str, data: BytesIO | None = None) -> str:
name = path

with open(path, "rb") as f:
res = self.client.simple_upload(self.model_id, name, f).json()
data = BytesIO(f.read())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with large files, it loads the entire file contents into memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added new integration test to prevent this in future

@GB27247 GB27247 self-requested a review April 23, 2024 18:11
@bw27492 bw27492 merged commit 126de95 into main May 8, 2024
14 checks passed
@bw27492 bw27492 deleted the BAI-1143-download-patterns branch May 8, 2024 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants