Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a realtime ASR demo (both server and client) for DS2 users to try with own voice. #186

Merged
merged 7 commits into from
Aug 7, 2017

Conversation

xinghai-sun
Copy link
Contributor

No description provided.

@xinghai-sun xinghai-sun requested review from pkuyym and luotao1 August 3, 2017 04:40
@xinghai-sun xinghai-sun mentioned this pull request Aug 3, 2017
17 tasks

### Playing with the ASR Demo

A real-time ASR demo (`demo_server.py` and `demo_client.py`) are prepared for users to try out the ASR model with their own voice. After a model and language model is prepared, we can first start the demo server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is prepared -> are prepared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -83,6 +83,23 @@ def __init__(self,
self._rng = random.Random(random_seed)
self._epoch = 0

def process_utterance(self, filename, transcript):
"""Load, augment, featurize and normalize for speech data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""换一行再写注释,下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The practice of

def function():
"""This is function doc.
"""

follows the Google Coding Style for Python. And we keep this style consistent throughout the whole DS2 project.

@@ -0,0 +1,94 @@
"""Client-end for the ASR demo."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

开头这里用 # 来注释,可参考layers.py的注释方式。下同。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In layers.py, # is only used for copyrights declaration.
For file's doc, use """ instead.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@@ -16,6 +16,19 @@ export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/li

Please replace `$PADDLE_INSTALL_DIR` with your own paddle installation directory.

### Setup for Demo

Please do the following extra installation before run `demo_client.py` to try the realtime ASR demo. However there is no need to install them for the computer running the demo's server-end (`demo_server.py`). For details of running the ASR demo, please refer to the [section](#playing-with-the-asr-demo).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run --> running
I think it's better to let ASR demo be a single section including setup and running instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to demo section.

@@ -35,6 +35,7 @@ def __init__(self, vocab_size, num_conv_layers, num_rnn_layers,
rnn_layer_size)
self._create_parameters(pretrained_model_path)
self._inferer = None
self._loss_inferer = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._cost_inferer be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big difference between loss and cost. I prefer loss for it is more commonly used.

@@ -118,6 +119,24 @@ def event_handler(event):
num_passes=num_passes,
feeding=feeding_dict)

def infer_loss_batch(self, infer_data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infer_batch_cost be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big difference between loss and cost. I prefer loss for it is more commonly used.

"""Model inference. Infer the ctc loss for a batch of speech
utterances.

:param infer_data: List of utterances to infer, with each utterance a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with each utterance --> each utterance consists of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> with each utterance consisting of ....


If you would like to start the server and the client in two machines. Please use `--host_ip` and `--host_port` to indicate the actual IP address and port, for both `demo_server.py` and `demo_client.py`.

Notice that `demo_client.py` should be started in your local computer with microphone hardware, while `demo_server.py` can be started in any remote server as well as the same local computer. IP address and port should be properly set for server-client communication.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should point out that accessing to remote server through the network should be make sured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -83,6 +83,23 @@ def __init__(self,
self._rng = random.Random(random_seed)
self._epoch = 0

def process_utterance(self, filename, transcript):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transcript -> transcription, they have different meanings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of transcription and transcript could refer to the text contents of speech. I use "transcription" for docs and "transcript" for code variables (due to the shorter length).

elif len(data_list) > 0:
# Connect to server and send data
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((args.host_ip, args.host_port))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to connect at first and reusing the connection rather than connecting every time when sending messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening a connection only cost several milliseconds. Besides, opening an independent connection for each utterance simplifies the codes (otherwise, in the server side, we have to handle multiple utterances with while-loop in a single handle() call).

type=float,
help="The cutoff probability of pruning"
"in beam search. (default: %(default)f)")
args = parser.parse_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many duplicated arguments with infer.py and evaluate.py, maybe we can refactor this part later. Mark here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let's discuss it later.


def handle(self):
# receive data through TCP socket
chunk = self.request.recv(1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make 1024 be an optional argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that. Chunk size does not matter.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinghai-sun xinghai-sun merged commit 1da8f7a into PaddlePaddle:develop Aug 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants