Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

Open
2 tasks done
zhiltsov-max opened this issue Jan 30, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@zhiltsov-max
Copy link
Contributor

zhiltsov-max commented Jan 30, 2025

Actions before raising this issue

  • I searched the existing issues and did not find anything similar.
  • I read/searched the docs

Steps to Reproduce

Related #9010

from cvat_sdk import make_client, models
from cvat_sdk.core.proxies.tasks import ResourceType

with make_client("http://localhost", port=8080, credentials=("user", "password")) as client:
    task = client.tasks.create_from_data(
        spec=models.TaskWriteRequest(
            name="task with cs images frame filter and honeypots",
            labels=[{"name": "cat"}],
            segment_size=3,
        ),
        resources=[
            "test/frame_1.jpg",
            "test/frame_2.jpg",
            "test/frame_3.jpg",
            "test/frame_4.jpg",
            "test/frame_5.jpg",
            "test/frame_6.jpg",
            "test/frame_7.jpg",
            "test/frame_8.jpg",
            "test/frame_9.jpg",
            "test/frame_10.jpg",
            "test/frame_11.jpg",
            "test/frame_12.jpg",
            "test/frame_13.jpg",
            "test/frame_14.jpg",
        ],
        resource_type=ResourceType.SHARE,
        data_params=dict(
            cloud_storage_id=157,
            image_quality=70,
            sorting_method="random",
            start_frame=2,
            stop_frame=14,
            frame_step=2,
            validation_params={
                "mode": "gt_pool",
                "frame_selection_method": "random_uniform",
                "frame_count": 3,
                "frames_per_job_count": 2,
            },
            use_cache=False, # ensure static cache
        ),
    )
$ CVAT_ALLOW_STATIC_CACHE=yes SMOKESCREEN_OPTS="--allow-address=172.22.0.1" docker compose -f docker-compose.yml up -d

$ python samples/create_task.py
...
`FileNotFoundError: [Errno 2] No such file or directory: '/home/django/data/data/1901/raw/test/frame_xxx.jpg`

Expected Behavior

No response

Possible Solution

No response

Context

  1. Here
    filtered_data = []
    for files in (i for i in media.values() if i):
    filtered_data.extend(files)
    media_to_download = filtered_data
    if media['image']:
    start_frame = db_data.start_frame
    stop_frame = len(filtered_data) - 1
    if data['stop_frame'] is not None:
    stop_frame = min(stop_frame, data['stop_frame'])
    step = db_data.get_frame_step()
    if start_frame or step != 1 or stop_frame != len(filtered_data) - 1:
    media_to_download = filtered_data[start_frame : stop_frame + 1: step]
    _download_data_from_cloud_storage(db_data.cloud_storage, media_to_download, upload_dir)
    frame sorting is not applied yet, so the input files are filtered as they are in the input list
  2. Next, a media extractor is created
    details = {
    'source_path': source_paths,
    'step': db_data.get_frame_step(),
    'start': db_data.start_frame,
    'stop': data['stop_frame'],
    }
    if media_type in {'archive', 'zip', 'pdf'} and db_data.storage == models.StorageChoice.SHARE:
    details['extract_dir'] = db_data.get_upload_dirname()
    upload_dir = db_data.get_upload_dirname()
    db_data.storage = models.StorageChoice.LOCAL
    if media_type != 'video':
    details['sorting_method'] = data['sorting_method'] if not is_media_sorted else models.SortingMethod.PREDEFINED
    extractor = MEDIA_TYPES[media_type]['extractor'](**details)
    with the same filtering and sorting params
  3. Next, a manifest creation starts here

    cvat/cvat/apps/engine/task.py

    Lines 1105 to 1114 in 3b5202e

    manifest.link(
    sources=extractor.absolute_source_paths,
    meta={
    k: {'related_images': related_images[k] }
    for k in related_images
    },
    data_dir=upload_dir,
    DIM_3D=(db_task.dimension == models.DimensionType.DIM_3D),
    )
    manifest.create()
    . It gets items from the extractor, but the frames are returned sorted. If the sorting is random, it can be a different order from what was in the input and (1). Then, frame filter is applied, and some frames are missing after downloading in (1)
  4. If honeypots are requested as well, the code fails in
    manifest.reorder([images[frame_idx_map[image.frame]].path for image in new_db_images])
    because .reorder() doesn't seem expect frames without meta in the manifest.

Environment

@zhiltsov-max zhiltsov-max added the bug Something isn't working label Jan 30, 2025
@shwetd19
Copy link

Hey @zhiltsov-max as I can see here.. this issue occurs because frames are downloaded from cloud storage without applying sorting/filtering upfront, but later steps (like manifest creation or honeypot reordering) expect a sorted/filtered frame list.

When sorting_method='random' or frame_step filters frames, the downloaded files don’t match, causing FileNotFoundError. A fix could involve syncing the download (_download_data_from_cloud_storage) with the filtered frame list from the extractor, or pre-fetching all required frames via task.fetch_data() in the SDK.

Would it be okay if I work on prototyping and testing this solution?

@zhiltsov-max
Copy link
Contributor Author

@shwetd19, hi,

A fix could involve syncing the download (_download_data_from_cloud_storage) with the filtered frame list from the extractor, or pre-fetching all required frames via task.fetch_data() in the SDK.

I think, we need to make sure the sorting is only applied once, because otherwise random sorting will not work correctly.

Would it be okay if I work on prototyping and testing this solution?

Sure, don't hesitate to send us a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants