Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

galthran-wq
Copy link
Contributor

Tickets

#1433
#1101
#1434

What has been done

With this PR, .extract_faces is able to accept a list of images

How to test

make lint && make test

Benchmarking on detecting 50 faces:
image

For yolov11n, batch size 20 is 59.27% faster than batch size 1.
For yolov11s, batch size 20 is 29.00% faster than batch size 1.
For yolov11m, batch size 20 is 31.73% faster than batch size 1.
For yolov8, batch size 20 is 12.68% faster than batch size 1.

@skyler14
Copy link

Do you have a branch in your fork that currently combines all the optimizations you've submitted. I'd like to start using them while the approval process is going

whats been the total speedup you've been able to see

@galthran-wq
Copy link
Contributor Author

I do. You can check
https://github.com/galthran-wq/deepface/tree/master-enhanced

it combines these two PRs with some other small modifications:

  • .represent uses batched detector inference (here Batching on .represent to improve performance and utilize GPU in full #1433 it only does batched embedding, because batched detection is not yet implemented)
  • .represent returns a list of list of dicts, if a batch of images is passed. This is neccessary to be able to recover to which images the resulting faces correspond to. It might be a good idea to include this change in the PR as well. You can check the test in the fork.

Not all of the detectors currently (both in this PR and in the fork) implement batching. In particular, YOLO does. I've found it to be optimal in terms of performance and inference speed. The only problem is installing both torch and tensorflow with GPU, but I've managed to somehow do that.

All in all, with the combination of yolov11m and Facenet, both using GPU, and batch size 100 (the largest I could fit in 4090) I am seeing aroung 15x speed boost, but that is highly dependent on the input images, the GPU (especially memory size). I've also had a quick peek and it seems like the performance on the CPU is improved as well.

@serengil FYI I would be happy to contribute the aforementioned modifications if we have progress on the PRs.

@serengil
Copy link
Owner

I will review this PR this week i hope

@serengil
Copy link
Owner

Seems this breaks the unit tests. Must be sorted.

@galthran-wq
Copy link
Contributor Author

should be good now

@serengil
Copy link
Owner

Nope, still failing.

@serengil
Copy link
Owner

You implemented OpenCv, Ssd, Yolo, MtCnn and RetinaFace to accept list inputs

What if I send list to YuNet, MediaPipe, FastMtCnn, Dlib or CenterFace?

I assume an exception will be thrown, but users should see a meaningful message.

@galthran-wq
Copy link
Contributor Author

galthran-wq commented Feb 23, 2025

To summarize what's changed:

  • I've added comments and additional checks to the tests.
  • I've made batching on opencv and mtcnn optional (due to the above issue). To enforce batching a user can set ENABLE_OPENCV_BATCH_DETECTION (or ENABLE_MTCNN_BATCH_DETECTION) to true.

Unfortunately, this didn't fix the batch extraction case on opencv -- the problem is that is occasionally fails (it seems that the predictions have some random behaviour, so the results might be different from run to run!). Note that it has nothing to do with batching, because it is disabled by default. We might add a separate issue and test to reproduce this.

  • i have fixed the special case of a single image in a list input (batch of size 1). It now indeed returns a list with a single element -- the list of detected faces in that image.
  • those detectors that do no implement batching all had repeating logic in detect_faces. I have moved this logic to the default implementation in Detector. Now, those detectors only need to implement _process_single_image, and batching would be supported by inheritance.
  • If a detector implements batching, then it overrides detect_faces with this logic, just as before.

@serengil
Copy link
Owner

If detector is opencv or mtcnn, and input is batch, then we should raise an error. We don't need ENABLE_OPENCV_BATCH_DETECTION or ENABLE_MTCNN_BATCH_DETECTION environment variables.

TBH, i don't want to make this optional to users. They are raising issues when they have something is wrong.

@serengil
Copy link
Owner

Please add unit test cases for opencv and mtcnn. When batch input is sent, then exception should be thrown.

@serengil
Copy link
Owner

I don't like this approach:

those detectors that do no implement batching all had repeating logic in detect_faces. I have moved this logic to the default implementation in Detector. Now, those detectors only need to implement _process_single_image, and batching would be supported by inheritance.
If a detector implements batching, then it overrides detect_faces with this logic, just as before.

all detectors should have detect_faces instead of _process_single_image, and you should raise an error in that detector if it is not supported.

@galthran-wq
Copy link
Contributor Author

Changes:

  1. reverted _process_single_image
  2. disabled option to use batch mode for mtcnn and opencv

so now they just use a for loop to process all the images, if a batch is passed. A user gets a warning that there is no actual batching happening.

Do you still think it is a good idea to raise an error?

@@ -1,5 +1,6 @@
# built-in dependencies
from typing import List
import logging
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using logging module directly anywhere. use this instead:

from deepface.commons.logger import Logger
logger = Logger()

@@ -8,6 +9,8 @@
# project dependencies
from deepface.models.Detector import Detector, FacialAreaRegion

logger = logging.getLogger(__name__)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger = Logger()

@@ -1,6 +1,7 @@
# built-in dependencies
import os
from typing import Any, List
from typing import Any, List, Union
import logging
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

resp.append(facial_area)

return resp
is_batched_input = isinstance(img, list)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are writing same code in every detector

        is_batched_input = isinstance(img, list)
        if not is_batched_input:
            imgs = [img]
        else:
            imgs = img

IMO, this should be done in the parent where those detectors are being called. In that way, we will not have repeated code.


batch_results.append(resp)

if not is_batched_input:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, we can do this in parent and avoid repeated code.

        if not is_batched_input:
            return batch_results[0]
        return batch_results


resp.append(facial_area)
img_rgb = [img[:, :, ::-1] for img in img]
if self.supports_batch_detection:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.supports_batch_detection is returning false always. so, get rid of _supports_batch_detection.

have just this

detections = [self.model.detect_faces(single_img) for single_img in img_rgb]

@@ -29,55 +32,72 @@ def build_model(self):
detector["eye_detector"] = self.__build_cascade("haarcascade_eye")
return detector

def detect_faces(self, img: np.ndarray) -> List[FacialAreaRegion]:
def _supports_batch_detection(self) -> bool:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need _supports_batch_detection, it is returning false always

detected_face = None
if isinstance(img, np.ndarray):
imgs = [img]
elif self.supports_batch_detection:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition is always false, get rid of it

@@ -79,6 +83,169 @@ def test_different_detectors():
logger.info(f"✅ extract_faces for {detector} backend test is done")


@pytest.mark.parametrize("detector_backend", [
# "opencv",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activate opencv please, it is default detector. important to have tests.

expected_num_faces = [1, 1, 1, 2]

# load images as numpy arrays
imgs_batch = np.stack(imgs, axis=0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this control

assert imgs_batch.ndim == 4 and imgs_batch.shape[0] == 4

img_path=[img_path],
align=True,
)
assert len(imgs_objs_batch) == 1 and isinstance(imgs_objs_batch[0], list)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check this

assert len(imgs_objs_batch[0]) == 2

@serengil
Copy link
Owner

serengil commented Mar 9, 2025

so now they just use a for loop to process all the images, if a batch is passed. A user gets a warning that there is no actual batching happening.

for loop is okay, don't raise error

@serengil
Copy link
Owner

serengil commented Mar 9, 2025

As I understand, you built a for loop in each detector. Is that really a batch?

What if we build a for loop in the parent of detectors, and don't change detectors?

@serengil
Copy link
Owner

serengil commented Mar 9, 2025

I mean batch is really running for yolo

We can create a method extract faces from batch in parent class of detectors, and only customize this for yolo.

In parent class, we basically build a for loop and call detector

In that way, detectors will not change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants