-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization performance with embedding_batch_size #688
Comments
Yes you can but you have to modify the A way to overwrite embedding_batch_size default value:
|
@raulpetru why is this the case? lowering embedding_batch_size is better for performance? will it degrade the quality tho? |
@SeeknnDestroy might be a bug? It isn't clear, read here. I haven't tested the accuracy, but I believe there is no quality degradation. |
I have an RTX 3080 10GB card, and thought I was going insane trying to get the diarization to work. I have a 1hour 45minute long meeting recording that I am trying to get transcribed. The original transcription takes about 84 seconds, alignment about 44seconds. Then diarization would run forever. I let it run for over an hour with no results. Tried splitting the file into chunks, still never finished, even with a 20minute chunk. Most of the GPU memory was being used so I suspect there was some sort of crazy memory swapping going on, but I'm not sure. After making the change suggested by @raulpetru above, creation if the diarize segments finishes in 95 seconds (this ran for an hour before without finishing!), and assigning speaker IDs took 12 seconds. Even more, GPU memory use was around 3GB instead of being 9.5GB to 9.7GB before. I'm not sure if this setting impacts diarization quality, but wow, for someone with a smaller amount of GPU memory, it allows the system to actually work!! I hope this setting can be integrated into a future release of whisperX, I'm sure there are many people out there with 10GB, 12GB (or smaller!) GPUs who are having the same problem. Thank you to @metheofanis for creating the issue & suggesting the fix, and to @raulpetru for explaining how to change the setting in whisperX! Edit: And for anyone using miniconda like me, the |
I just updated to the latest whisperx version and I also had to change the segmentation_batch_size to 8 (default is 32) in order to obtain a good diarization time.
|
is it running on the GPU though or CPU? |
Related / Duplicate |
Running diarization is extremely slow.
I have NVIDIA 3060 with 12GB VRAM
It looks like it is using the
pyannote
defaultembedding_batch_size: 32
If I run it locally, offline, where I can edit the
SpeakerDiarization.yaml
file and giveembedding_batch_size: 8
, the performance is more than 37X.Is there any way to pass the
embedding_batch_size
as parameter to theDiarizationPipeline
?If not, I suggest to allow this!
I'm not expert to make a PR.
Do I miss something?
Thanks.
The text was updated successfully, but these errors were encountered: