Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

zhiltsov-max · 2025-01-30T11:16:56Z

Actions before raising this issue

I searched the existing issues and did not find anything similar.
I read/searched the docs

Steps to Reproduce

Related #9010

from cvat_sdk import make_client, models
from cvat_sdk.core.proxies.tasks import ResourceType

with make_client("http://localhost", port=8080, credentials=("user", "password")) as client:
    task = client.tasks.create_from_data(
        spec=models.TaskWriteRequest(
            name="task with cs images frame filter and honeypots",
            labels=[{"name": "cat"}],
            segment_size=3,
        ),
        resources=[
            "test/frame_1.jpg",
            "test/frame_2.jpg",
            "test/frame_3.jpg",
            "test/frame_4.jpg",
            "test/frame_5.jpg",
            "test/frame_6.jpg",
            "test/frame_7.jpg",
            "test/frame_8.jpg",
            "test/frame_9.jpg",
            "test/frame_10.jpg",
            "test/frame_11.jpg",
            "test/frame_12.jpg",
            "test/frame_13.jpg",
            "test/frame_14.jpg",
        ],
        resource_type=ResourceType.SHARE,
        data_params=dict(
            cloud_storage_id=157,
            image_quality=70,
            sorting_method="random",
            start_frame=2,
            stop_frame=14,
            frame_step=2,
            validation_params={
                "mode": "gt_pool",
                "frame_selection_method": "random_uniform",
                "frame_count": 3,
                "frames_per_job_count": 2,
            },
            use_cache=False, # ensure static cache
        ),
    )

$ CVAT_ALLOW_STATIC_CACHE=yes SMOKESCREEN_OPTS="--allow-address=172.22.0.1" docker compose -f docker-compose.yml up -d

$ python samples/create_task.py
...
`FileNotFoundError: [Errno 2] No such file or directory: '/home/django/data/data/1901/raw/test/frame_xxx.jpg`

Expected Behavior

No response

Possible Solution

No response

Context

Here

cvat/cvat/apps/engine/task.py

Lines 748 to 763 in 3b5202e

    
           filtered_data = [] 
        
           for files in (i for i in media.values() if i): 
        
               filtered_data.extend(files) 
        
           media_to_download = filtered_data 
        
           if media['image']: 
        
               start_frame = db_data.start_frame 
        
               stop_frame = len(filtered_data) - 1 
        
               if data['stop_frame'] is not None: 
        
                   stop_frame = min(stop_frame, data['stop_frame']) 
        
               step = db_data.get_frame_step() 
        
               if start_frame or step != 1 or stop_frame != len(filtered_data) - 1: 
        
                   media_to_download = filtered_data[start_frame : stop_frame + 1: step] 
        
           _download_data_from_cloud_storage(db_data.cloud_storage, media_to_download, upload_dir)

frame sorting is not applied yet, so the input files are filtered as they are in the input list

Next, a media extractor is created

cvat/cvat/apps/engine/task.py

Lines 848 to 861 in 3b5202e

    
           details = { 
        
               'source_path': source_paths, 
        
               'step': db_data.get_frame_step(), 
        
               'start': db_data.start_frame, 
        
               'stop': data['stop_frame'], 
        
           } 
        
           if media_type in {'archive', 'zip', 'pdf'} and db_data.storage == models.StorageChoice.SHARE: 
        
               details['extract_dir'] = db_data.get_upload_dirname() 
        
               upload_dir = db_data.get_upload_dirname() 
        
               db_data.storage = models.StorageChoice.LOCAL 
        
           if media_type != 'video': 
        
               details['sorting_method'] = data['sorting_method'] if not is_media_sorted else models.SortingMethod.PREDEFINED 
        
           extractor = MEDIA_TYPES[media_type]['extractor'](**details)

with the same filtering and sorting params

Next, a manifest creation starts here

cvat/cvat/apps/engine/task.py

Lines 1105 to 1114 in 3b5202e

    
           manifest.link( 
        
               sources=extractor.absolute_source_paths, 
        
               meta={ 
        
                   k: {'related_images': related_images[k] } 
        
                   for k in related_images 
        
               }, 
        
               data_dir=upload_dir, 
        
               DIM_3D=(db_task.dimension == models.DimensionType.DIM_3D), 
        
           ) 
        
           manifest.create()

. It gets items from the extractor, but the frames are returned sorted. If the sorting is random, it can be a different order from what was in the input and (1). Then, frame filter is applied, and some frames are missing after downloading in (1)

If honeypots are requested as well, the code fails in

cvat/cvat/apps/engine/task.py

Line 1337 in 3b5202e

manifest.reorder([images[frame_idx_map[image.frame]].path for image in new_db_images])

because .reorder() doesn't seem expect frames without meta in the manifest.

Environment

The text was updated successfully, but these errors were encountered:

shwetd19 · 2025-02-22T19:32:18Z

Hey @zhiltsov-max as I can see here.. this issue occurs because frames are downloaded from cloud storage without applying sorting/filtering upfront, but later steps (like manifest creation or honeypot reordering) expect a sorted/filtered frame list.

When sorting_method='random' or frame_step filters frames, the downloaded files don’t match, causing FileNotFoundError. A fix could involve syncing the download (_download_data_from_cloud_storage) with the filtered frame list from the extractor, or pre-fetching all required frames via task.fetch_data() in the SDK.

Would it be okay if I work on prototyping and testing this solution?

zhiltsov-max · 2025-02-24T08:05:00Z

@shwetd19, hi,

A fix could involve syncing the download (_download_data_from_cloud_storage) with the filtered frame list from the extractor, or pre-fetching all required frames via task.fetch_data() in the SDK.

I think, we need to make sure the sorting is only applied once, because otherwise random sorting will not work correctly.

Would it be okay if I work on prototyping and testing this solution?

Sure, don't hesitate to send us a PR.

zhiltsov-max added the bug Something isn't working label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

zhiltsov-max commented Jan 30, 2025 •

edited

Loading

shwetd19 commented Feb 22, 2025

zhiltsov-max commented Feb 24, 2025

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

Task creation with cloud storage, frame filter (optionally honeypots) and static cache fails #9021

Comments

zhiltsov-max commented Jan 30, 2025 • edited Loading

Actions before raising this issue

Steps to Reproduce

Expected Behavior

Possible Solution

Context

Environment

shwetd19 commented Feb 22, 2025

zhiltsov-max commented Feb 24, 2025

zhiltsov-max commented Jan 30, 2025 •

edited

Loading