Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实时语音转换出来的都是断断续续的哑音 #125

Open
360PB opened this issue Feb 10, 2025 · 3 comments
Open

实时语音转换出来的都是断断续续的哑音 #125

360PB opened this issue Feb 10, 2025 · 3 comments

Comments

@360PB
Copy link

360PB commented Feb 10, 2025

Image

使用的默认配置:python real-time-gui.py

Using fp16: True
Warning: Skipped loading some keys due to shape mismatch: {'estimator.input_pos'}
cfm loaded
length_regulator loaded
funasr version: 1.2.3.
Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel
You are using the latest version of funasr-1.2.3
2025-02-10 20:53:31,567 - modelscope - INFO - Use user-specified model revision: v2.0.4
Input device: 1:麦克风 (Realtek(R) Audio)
Output device: 20:Voicemeeter Input (VB-Audio Voi
cuda_is_available: True
(3969, 2)
rtf_avg: 0.293: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 23.44it/s]
[]
Time taken for VAD: 43.56163024902344ms
preprocess time: 0.09
Setting ce_dit_difference to 0.0 seconds.
Setting max prompt length to 3.0 seconds.
max value is tensor(1.0173, device='cuda:0')
Time taken for semantic_fn: 88.05068969726562ms
target_lengths: tensor([237], device='cuda:0')
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 42.99it/s]
vc_target.shape: torch.Size([1, 80, 237])
Time taken for VC: 864.5071411132812ms
sola_offset = 0
Infer time: 0.96
(3969, 2)

@Plachtaa
Copy link
Owner

Hi, thanks for your reply, I have confirmed this problem from multiple feedbacks.
Will come up with a fix soon

@Plachtaa
Copy link
Owner

Hi there,
I noticed your inter time is displayed as:

Time taken for VC: 864.5071411132812ms
sola_offset = 0
Infer time: 0.96
(3969, 2)

This is far beyond your Block time. Please extend the block time according to your device performance so that infer time is less than it.

Thanks

This was referenced Feb 10, 2025
@skyCloud-CN
Copy link

same problem , strange voice only and this is my screenshot

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants