Diarization performance with embedding_batch_size #688

metheofanis · 2024-02-04T11:37:02Z

Running diarization is extremely slow.
I have NVIDIA 3060 with 12GB VRAM

It looks like it is using the pyannote default embedding_batch_size: 32

If I run it locally, offline, where I can edit the SpeakerDiarization.yaml file and give embedding_batch_size: 8, the performance is more than 37X.

Is there any way to pass the embedding_batch_size as parameter to the DiarizationPipeline?
If not, I suggest to allow this!
I'm not expert to make a PR.
Do I miss something?
Thanks.

The text was updated successfully, but these errors were encountered:

raulpetru · 2024-03-31T10:12:57Z

Yes you can but you have to modify the pipeline.py file (located at whisperx\Lib\site-packages\pyannote\audio\core).

A way to overwrite embedding_batch_size default value:

params = config["pipeline"].get("params", {})
params.setdefault("use_auth_token", use_auth_token)
# Overwrite embedding_batch_size
params["embedding_batch_size"] = 8
pipeline = Klass(**params)

SeeknnDestroy · 2024-04-25T21:46:41Z

@raulpetru why is this the case? lowering embedding_batch_size is better for performance? will it degrade the quality tho?

raulpetru · 2024-05-03T15:50:58Z

@SeeknnDestroy might be a bug? It isn't clear, read here.
For me lowering the embedding_batch_size to 8 did increase the diarization performance significantly.
Actually thanks to @metheofanis by opening this issue I found out this performance fix.

I haven't tested the accuracy, but I believe there is no quality degradation.
If you do test, please let me know!

techjp · 2024-07-22T09:09:20Z

I have an RTX 3080 10GB card, and thought I was going insane trying to get the diarization to work.

I have a 1hour 45minute long meeting recording that I am trying to get transcribed. The original transcription takes about 84 seconds, alignment about 44seconds. Then diarization would run forever. I let it run for over an hour with no results. Tried splitting the file into chunks, still never finished, even with a 20minute chunk.

Most of the GPU memory was being used so I suspect there was some sort of crazy memory swapping going on, but I'm not sure.

After making the change suggested by @raulpetru above, creation if the diarize segments finishes in 95 seconds (this ran for an hour before without finishing!), and assigning speaker IDs took 12 seconds. Even more, GPU memory use was around 3GB instead of being 9.5GB to 9.7GB before.

I'm not sure if this setting impacts diarization quality, but wow, for someone with a smaller amount of GPU memory, it allows the system to actually work!!

I hope this setting can be integrated into a future release of whisperX, I'm sure there are many people out there with 10GB, 12GB (or smaller!) GPUs who are having the same problem.

Thank you to @metheofanis for creating the issue & suggesting the fix, and to @raulpetru for explaining how to change the setting in whisperX!

Edit: And for anyone using miniconda like me, the pipeline.py file is here (assuming your environment is named whisperx and you are using Python 3.10, of course):
~/miniconda3/envs/whisperx/lib/python3.10/site-packages/pyannote/audio/core

raulpetru · 2025-01-18T22:15:35Z

I just updated to the latest whisperx version and I also had to change the segmentation_batch_size to 8 (default is 32) in order to obtain a good diarization time.

params = config["pipeline"].get("params", {})
params.setdefault("use_auth_token", use_auth_token)
# Overwrite embedding_batch_size and segmentation_batch_size
params["embedding_batch_size"] = 8
params["segmentation_batch_size"] = 8
pipeline = Klass(**params)

dinopio · 2025-03-07T19:27:14Z

I just updated to the latest whisperx version and I also had to change the segmentation_batch_size to 8 (default is 32) in order to obtain a good diarization time.
params = config["pipeline"].get("params", {})
params.setdefault("use_auth_token", use_auth_token)
# Overwrite embedding_batch_size and segmentation_batch_size
params["embedding_batch_size"] = 8
params["segmentation_batch_size"] = 8
pipeline = Klass(**params)

is it running on the GPU though or CPU?

zckrs · 2025-03-07T19:46:09Z

Related / Duplicate
#274 (comment)

raulpetru mentioned this issue Jan 19, 2025

pyannote/speaker-diarization-3.0 runs slower than pyannote/[email protected] #499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diarization performance with embedding_batch_size #688

Diarization performance with embedding_batch_size #688

metheofanis commented Feb 4, 2024

raulpetru commented Mar 31, 2024 •

edited

Loading

SeeknnDestroy commented Apr 25, 2024

raulpetru commented May 3, 2024 •

edited

Loading

techjp commented Jul 22, 2024 •

edited

Loading

raulpetru commented Jan 18, 2025 •

edited

Loading

dinopio commented Mar 7, 2025

zckrs commented Mar 7, 2025

Diarization performance with embedding_batch_size #688

Diarization performance with embedding_batch_size #688

Comments

metheofanis commented Feb 4, 2024

raulpetru commented Mar 31, 2024 • edited Loading

SeeknnDestroy commented Apr 25, 2024

raulpetru commented May 3, 2024 • edited Loading

techjp commented Jul 22, 2024 • edited Loading

raulpetru commented Jan 18, 2025 • edited Loading

dinopio commented Mar 7, 2025

zckrs commented Mar 7, 2025

raulpetru commented Mar 31, 2024 •

edited

Loading

raulpetru commented May 3, 2024 •

edited

Loading

techjp commented Jul 22, 2024 •

edited

Loading

raulpetru commented Jan 18, 2025 •

edited

Loading