-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a realtime ASR demo (both server and client) for DS2 users to try with own voice. #186
Conversation
deep_speech_2/README.md
Outdated
|
||
### Playing with the ASR Demo | ||
|
||
A real-time ASR demo (`demo_server.py` and `demo_client.py`) are prepared for users to try out the ASR model with their own voice. After a model and language model is prepared, we can first start the demo server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is prepared -> are prepared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -83,6 +83,23 @@ def __init__(self, | |||
self._rng = random.Random(random_seed) | |||
self._epoch = 0 | |||
|
|||
def process_utterance(self, filename, transcript): | |||
"""Load, augment, featurize and normalize for speech data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""换一行再写注释,下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The practice of
def function():
"""This is function doc.
"""
follows the Google Coding Style for Python. And we keep this style consistent throughout the whole DS2 project.
@@ -0,0 +1,94 @@ | |||
"""Client-end for the ASR demo.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
开头这里用 # 来注释,可参考layers.py的注释方式。下同。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In layers.py, # is only used for copyrights declaration.
For file's doc, use """
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
deep_speech_2/README.md
Outdated
@@ -16,6 +16,19 @@ export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/li | |||
|
|||
Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory. | |||
|
|||
### Setup for Demo | |||
|
|||
Please do the following extra installation before run `demo_client.py` to try the realtime ASR demo. However there is no need to install them for the computer running the demo's server-end (`demo_server.py`). For details of running the ASR demo, please refer to the [section](#playing-with-the-asr-demo). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run --> running
I think it's better to let ASR demo be a single section including setup and running instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to demo section.
@@ -35,6 +35,7 @@ def __init__(self, vocab_size, num_conv_layers, num_rnn_layers, | |||
rnn_layer_size) | |||
self._create_parameters(pretrained_model_path) | |||
self._inferer = None | |||
self._loss_inferer = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._cost_inferer
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big difference between loss and cost. I prefer loss for it is more commonly used.
@@ -118,6 +119,24 @@ def event_handler(event): | |||
num_passes=num_passes, | |||
feeding=feeding_dict) | |||
|
|||
def infer_loss_batch(self, infer_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infer_batch_cost
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big difference between loss and cost. I prefer loss for it is more commonly used.
"""Model inference. Infer the ctc loss for a batch of speech | ||
utterances. | ||
|
||
:param infer_data: List of utterances to infer, with each utterance a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with each utterance --> each utterance consists of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> with each utterance consisting of ....
deep_speech_2/README.md
Outdated
|
||
If you would like to start the server and the client in two machines. Please use `--host_ip` and `--host_port` to indicate the actual IP address and port, for both `demo_server.py` and `demo_client.py`. | ||
|
||
Notice that `demo_client.py` should be started in your local computer with microphone hardware, while `demo_server.py` can be started in any remote server as well as the same local computer. IP address and port should be properly set for server-client communication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should point out that accessing to remote server through the network should be make sured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -83,6 +83,23 @@ def __init__(self, | |||
self._rng = random.Random(random_seed) | |||
self._epoch = 0 | |||
|
|||
def process_utterance(self, filename, transcript): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transcript -> transcription, they have different meanings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of transcription and transcript could refer to the text contents of speech. I use "transcription" for docs and "transcript" for code variables (due to the shorter length).
elif len(data_list) > 0: | ||
# Connect to server and send data | ||
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
sock.connect((args.host_ip, args.host_port)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to connect at first and reusing the connection rather than connecting every time when sending messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening a connection only cost several milliseconds. Besides, opening an independent connection for each utterance simplifies the codes (otherwise, in the server side, we have to handle multiple utterances with while-loop in a single handle() call).
type=float, | ||
help="The cutoff probability of pruning" | ||
"in beam search. (default: %(default)f)") | ||
args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many duplicated arguments with infer.py
and evaluate.py
, maybe we can refactor this part later. Mark here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Let's discuss it later.
|
||
def handle(self): | ||
# receive data through TCP socket | ||
chunk = self.request.recv(1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to make 1024
be an optional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for that. Chunk size does not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.