Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq · 2025-02-13T13:35:04Z

Tickets

What has been done

With this PR, .extract_faces is able to accept a list of images

How to test

make lint && make test

Benchmarking on detecting 50 faces:

For yolov11n, batch size 20 is 59.27% faster than batch size 1.
For yolov11s, batch size 20 is 29.00% faster than batch size 1.
For yolov11m, batch size 20 is 31.73% faster than batch size 1.
For yolov8, batch size 20 is 12.68% faster than batch size 1.

skyler14 · 2025-02-16T08:53:51Z

Do you have a branch in your fork that currently combines all the optimizations you've submitted. I'd like to start using them while the approval process is going

whats been the total speedup you've been able to see

galthran-wq · 2025-02-16T09:30:38Z

I do. You can check
https://github.com/galthran-wq/deepface/tree/master-enhanced

it combines these two PRs with some other small modifications:

.represent uses batched detector inference (here Batching on .represent to improve performance and utilize GPU in full #1433 it only does batched embedding, because batched detection is not yet implemented)
.represent returns a list of list of dicts, if a batch of images is passed. This is neccessary to be able to recover to which images the resulting faces correspond to. It might be a good idea to include this change in the PR as well. You can check the test in the fork.

Not all of the detectors currently (both in this PR and in the fork) implement batching. In particular, YOLO does. I've found it to be optimal in terms of performance and inference speed. The only problem is installing both torch and tensorflow with GPU, but I've managed to somehow do that.

All in all, with the combination of yolov11m and Facenet, both using GPU, and batch size 100 (the largest I could fit in 4090) I am seeing aroung 15x speed boost, but that is highly dependent on the input images, the GPU (especially memory size). I've also had a quick peek and it seems like the performance on the CPU is improved as well.

@serengil FYI I would be happy to contribute the aforementioned modifications if we have progress on the PRs.

serengil · 2025-02-16T09:32:37Z

I will review this PR this week i hope

serengil · 2025-02-16T12:07:23Z

Seems this breaks the unit tests. Must be sorted.

galthran-wq · 2025-02-16T14:47:36Z

should be good now

serengil · 2025-02-16T15:25:22Z

Nope, still failing.

tests/test_extract_faces.py

deepface/models/face_detection/RetinaFace.py

serengil · 2025-02-17T13:04:06Z

You implemented OpenCv, Ssd, Yolo, MtCnn and RetinaFace to accept list inputs

What if I send list to YuNet, MediaPipe, FastMtCnn, Dlib or CenterFace?

I assume an exception will be thrown, but users should see a meaningful message.

…on process_single_image method

galthran-wq · 2025-02-23T14:48:20Z

To summarize what's changed:

I've added comments and additional checks to the tests.
I've made batching on opencv and mtcnn optional (due to the above issue). To enforce batching a user can set ENABLE_OPENCV_BATCH_DETECTION (or ENABLE_MTCNN_BATCH_DETECTION) to true.

Unfortunately, this didn't fix the batch extraction case on opencv -- the problem is that is occasionally fails (it seems that the predictions have some random behaviour, so the results might be different from run to run!). Note that it has nothing to do with batching, because it is disabled by default. We might add a separate issue and test to reproduce this.

i have fixed the special case of a single image in a list input (batch of size 1). It now indeed returns a list with a single element -- the list of detected faces in that image.
those detectors that do no implement batching all had repeating logic in detect_faces. I have moved this logic to the default implementation in Detector. Now, those detectors only need to implement _process_single_image, and batching would be supported by inheritance.
If a detector implements batching, then it overrides detect_faces with this logic, just as before.

tests/test_extract_faces.py

deepface/modules/detection.py

deepface/models/Detector.py

serengil · 2025-02-24T15:17:52Z

If detector is opencv or mtcnn, and input is batch, then we should raise an error. We don't need ENABLE_OPENCV_BATCH_DETECTION or ENABLE_MTCNN_BATCH_DETECTION environment variables.

TBH, i don't want to make this optional to users. They are raising issues when they have something is wrong.

serengil · 2025-02-24T15:19:31Z

Please add unit test cases for opencv and mtcnn. When batch input is sent, then exception should be thrown.

serengil · 2025-02-24T15:22:23Z

I don't like this approach:

those detectors that do no implement batching all had repeating logic in detect_faces. I have moved this logic to the default implementation in Detector. Now, those detectors only need to implement _process_single_image, and batching would be supported by inheritance.
If a detector implements batching, then it overrides detect_faces with this logic, just as before.

all detectors should have detect_faces instead of _process_single_image, and you should raise an error in that detector if it is not supported.

…s based on process_single_image method" This reverts commit 8c7c2cb.

galthran-wq · 2025-03-08T15:30:12Z

Changes:

reverted _process_single_image
disabled option to use batch mode for mtcnn and opencv

so now they just use a for loop to process all the images, if a batch is passed. A user gets a warning that there is no actual batching happening.

Do you still think it is a good idea to raise an error?

serengil · 2025-03-09T11:45:25Z

deepface/models/face_detection/MtCnn.py

@@ -1,5 +1,6 @@
 # built-in dependencies
-from typing import List
+import logging


we are not using logging module directly anywhere. use this instead:

from deepface.commons.logger import Logger logger = Logger()

serengil · 2025-03-09T11:45:57Z

deepface/models/face_detection/MtCnn.py

@@ -8,6 +9,8 @@
 # project dependencies
 from deepface.models.Detector import Detector, FacialAreaRegion

+logger = logging.getLogger(__name__)


logger = Logger()

serengil · 2025-03-09T11:46:21Z

deepface/models/face_detection/OpenCv.py

@@ -1,6 +1,7 @@
 # built-in dependencies
 import os
-from typing import Any, List
+from typing import Any, List, Union
+import logging


serengil · 2025-03-09T11:48:38Z

deepface/models/face_detection/RetinaFace.py

-            resp.append(facial_area)
-
-        return resp
+        is_batched_input = isinstance(img, list)


you are writing same code in every detector

is_batched_input = isinstance(img, list) if not is_batched_input: imgs = [img] else: imgs = img

IMO, this should be done in the parent where those detectors are being called. In that way, we will not have repeated code.

serengil · 2025-03-09T11:49:08Z

deepface/models/face_detection/RetinaFace.py

+
+            batch_results.append(resp)
+
+        if not is_batched_input:


same here, we can do this in parent and avoid repeated code.

if not is_batched_input: return batch_results[0] return batch_results

serengil · 2025-03-09T11:52:56Z

deepface/models/face_detection/MtCnn.py

-
-                resp.append(facial_area)
+        img_rgb = [img[:, :, ::-1] for img in img]
+        if self.supports_batch_detection:


self.supports_batch_detection is returning false always. so, get rid of _supports_batch_detection.

have just this

detections = [self.model.detect_faces(single_img) for single_img in img_rgb]

serengil · 2025-03-09T11:53:36Z

deepface/models/face_detection/OpenCv.py

@@ -29,55 +32,72 @@ def build_model(self):
        detector["eye_detector"] = self.__build_cascade("haarcascade_eye")
        return detector

-    def detect_faces(self, img: np.ndarray) -> List[FacialAreaRegion]:
+    def _supports_batch_detection(self) -> bool:


no need _supports_batch_detection, it is returning false always

serengil · 2025-03-09T11:54:23Z

deepface/models/face_detection/OpenCv.py

-        detected_face = None
+        if isinstance(img, np.ndarray):
+            imgs = [img]
+        elif self.supports_batch_detection:


this condition is always false, get rid of it

serengil · 2025-03-09T13:40:24Z

tests/test_extract_faces.py

@@ -79,6 +83,169 @@ def test_different_detectors():
        logger.info(f"✅ extract_faces for {detector} backend test is done")


+@pytest.mark.parametrize("detector_backend", [
+    # "opencv",


activate opencv please, it is default detector. important to have tests.

serengil · 2025-03-09T13:43:14Z

tests/test_extract_faces.py

+    expected_num_faces = [1, 1, 1, 2]
+
+    # load images as numpy arrays
+    imgs_batch = np.stack(imgs, axis=0)


please add this control

assert imgs_batch.ndim == 4 and imgs_batch.shape[0] == 4

serengil · 2025-03-09T13:49:59Z

tests/test_extract_faces.py

+        img_path=[img_path],
+        align=True,
+    )
+    assert len(imgs_objs_batch) == 1 and isinstance(imgs_objs_batch[0], list)


please check this

assert len(imgs_objs_batch[0]) == 2

serengil · 2025-03-09T13:52:44Z

so now they just use a for loop to process all the images, if a batch is passed. A user gets a warning that there is no actual batching happening.

for loop is okay, don't raise error

serengil · 2025-03-09T14:16:47Z

As I understand, you built a for loop in each detector. Is that really a batch?

What if we build a for loop in the parent of detectors, and don't change detectors?

serengil · 2025-03-09T15:30:27Z

I mean batch is really running for yolo

We can create a method extract faces from batch in parent class of detectors, and only customize this for yolo.

In parent class, we basically build a for loop and call detector

In that way, detectors will not change

galthran-wq added 10 commits February 12, 2025 09:43

batched detection

f4d18a7

deepFace batch detection; typing

0ad7c57

test batch extract faces

b38e95c

chagne detector interface

737ee79

opencv pseudo batching

b2d6178

yolo detect batched

1bd8335

enhance batched detector test

bbf6a55

mtcnn batching

ba2ff90

soft test

ad01724

true batching on detect_faces

799f83c

galthran-wq added 4 commits February 16, 2025 13:30

detection skip

619930c

pseudo batched retinaface

b544a2d

test diff detetors

8bfdcf1

lint

7e59cdf

galthran-wq added 2 commits February 17, 2025 12:08

optional MtCnn batching (does not work in python3.8)

0f67dda

lint

c4b4b4a

serengil reviewed Feb 17, 2025

View reviewed changes

tests/test_extract_faces.py Show resolved Hide resolved

serengil reviewed Feb 17, 2025

View reviewed changes

tests/test_extract_faces.py Outdated Show resolved Hide resolved

serengil reviewed Feb 17, 2025

View reviewed changes

deepface/models/face_detection/RetinaFace.py Outdated Show resolved Hide resolved

galthran-wq added 4 commits February 18, 2025 08:48

detect faces return list of lists on batched inputs

60bee4e

add a couple to test batch extract faces

f3d05ef

add more models and detector-specific rtol

1c825e8

pseudo-batching dlib

1d358aa

galthran-wq added 5 commits February 23, 2025 12:17

comments

93b8af1

change behaviour in special case batched single image

8b1b465

rm opencv from batch test since it still occasionally fails

aae3af0

refactor detectors to have default detect_faces method that is based …

8c7c2cb

…on process_single_image method

lint

c5ba4a7

serengil reviewed Feb 24, 2025

View reviewed changes

tests/test_extract_faces.py Outdated Show resolved Hide resolved

Update test_extract_faces.py

6a3d14c

serengil reviewed Feb 24, 2025

View reviewed changes

deepface/modules/detection.py Show resolved Hide resolved

serengil reviewed Feb 24, 2025

View reviewed changes

deepface/models/Detector.py Outdated Show resolved Hide resolved

galthran-wq added 2 commits March 8, 2025 15:15

Revert "refactor detectors to have default detect_faces method that i…

da8f644

…s based on process_single_image method" This reverts commit 8c7c2cb.

no env variable to set batch mode for mtcnn and opencv

3111a28

serengil reviewed Mar 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq commented Feb 13, 2025

skyler14 commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 17, 2025

galthran-wq commented Feb 23, 2025 •

edited

Loading

serengil commented Feb 24, 2025

serengil commented Feb 24, 2025

serengil commented Feb 24, 2025

galthran-wq commented Mar 8, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil Mar 9, 2025

serengil commented Mar 9, 2025

serengil commented Mar 9, 2025

serengil commented Mar 9, 2025

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Are you sure you want to change the base?

Batching on .extract_faces to improve performance and utilize GPU in full #1435

Conversation

galthran-wq commented Feb 13, 2025

Tickets

What has been done

How to test

skyler14 commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 16, 2025

galthran-wq commented Feb 16, 2025

serengil commented Feb 16, 2025

serengil commented Feb 17, 2025

galthran-wq commented Feb 23, 2025 • edited Loading

serengil commented Feb 24, 2025

serengil commented Feb 24, 2025

serengil commented Feb 24, 2025

galthran-wq commented Mar 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serengil commented Mar 9, 2025

serengil commented Mar 9, 2025

serengil commented Mar 9, 2025

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

Batching on `.extract_faces` to improve performance and utilize GPU in full #1435

galthran-wq commented Feb 23, 2025 •

edited

Loading