-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved upload/download methods for python client #1205
Conversation
bw27492
commented
Apr 6, 2024
- Progress bars for uploads and downloading to disk
- Method to download all files from a release
- Ability to filter files based on regex (e.g. "*.json")
try: | ||
no_color = int(os.environ["NO_COLOR"]) | ||
except KeyError: | ||
no_color = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be implemented somewhere else probably the top level initialiser. So we can call on it in other parts of the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
try: | ||
no_color = int(os.environ["NO_COLOR"]) | ||
except KeyError: | ||
no_color = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at these docs and the example implementation. Something along the lines of:
NO_COLOR = "NO_COLOR" in os.environ
The docs suggest that as long as the environment variable exists or it's not a null terminating string which you don't need to worry about in python, we shouldn't worry about the value of the NO_COLOR
environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/python/tests/test_files.py
Outdated
for filename in ["test.json", "test2.txt", "to_exclude.txt"]: | ||
example_release.upload(filename, file) | ||
file.seek(0) | ||
|
||
example_release.download_all(path=downloads_path, include=["*.txt"], exclude=["to_exclude.txt"]) | ||
|
||
assert os.listdir(downloads_path) == ["test2.txt"] | ||
|
||
with open(str(downloads_path.join("test2.txt")), "rb") as f: | ||
assert f.read() == byte_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is a bit specific. I would split this test instead, and I would avoid downloading any files into a directory, as this could some side effect's in pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (split tests). Note - using the tmpdir fixture so writing to files isn't an issue
os.makedirs(path, exist_ok=True) | ||
for file in file_names: | ||
file_path = os.path.join(path, file) | ||
self.download(filename=file, path=file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want some better error handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary - os.makedirs
provides permission denied error
with open(path, "wb") as f: | ||
f.write(res.content) | ||
total_size = int(res.headers.get("content-length", 0)) | ||
block_size = 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a constant at the file, as it seems to be also used implicitly in the upload method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
:raises BailoException: If the release has no files assigned to it | ||
..note:: Regex statements must be complete to work as intended (uses re.match not re.search) | ||
""" | ||
files_metadata = self.client.get_release(self.model_id, self.version)["release"]["files"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to cast version to a string in this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
with tqdm( | ||
total=size, unit="B", unit_scale=True, unit_divisor=1024, postfix=f"uploading {name}", colour=colour | ||
) as t: | ||
wrapped_buffer = CallbackIOWrapper(t.update, data, "read") | ||
res = self.client.simple_upload(self.model_id, name, wrapped_buffer).json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The progress bar seems to complete even if the request fails. I tested this with uploading two of the same file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
return res | ||
|
||
def download_all(self, path: str = os.getcwd(), include: Union[list, str] = None, exclude: Union[list, str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old an new type annotations are being used here instead we should use list | str | None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -171,12 +227,26 @@ def upload(self, path: str, data: BytesIO | None = None) -> str: | |||
name = path | |||
|
|||
with open(path, "rb") as f: | |||
res = self.client.simple_upload(self.model_id, name, f).json() | |||
data = BytesIO(f.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with large files, it loads the entire file contents into memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added new integration test to prevent this in future