Paddle cloud web features design #378

typhoonzero · 2017-09-30T09:01:20Z

Fix #377

…develop

wangkuiyi

Thanks for this design!

You might want to check English writings using Grammarly.com.

wangkuiyi · 2017-09-30T17:26:43Z

doc/design/web.md

+
+## Account Management
+
+I'll skip this section because it is a design that almost every website need.


If this document is for Production Team's reference, we should at least give an example Web site here.

In my mind, we do need a mockup for this page. At least, once a user logs in, s/he must be able to see all his/her jobs listed. And s/he should be able to click each job to see the job's dashboard.

wangkuiyi · 2017-09-30T17:27:11Z

doc/design/web.md

+
+## Jupiter Notebook
+
+Start a ReplicaSet using image `docker.paddlepaddle.org/book` in kubernetes cluster and add an ingress endpoint when user first enters the notebook page.


kubernetes => Kubernetes

wangkuiyi · 2017-09-30T17:27:22Z

doc/design/web.md

+
+## Jupiter Notebook
+
+Start a ReplicaSet using image `docker.paddlepaddle.org/book` in kubernetes cluster and add an ingress endpoint when user first enters the notebook page.


ReplicaSet needs a URL as its reference.

wangkuiyi · 2017-09-30T17:27:36Z

doc/design/web.md

+
+## Jupiter Notebook
+
+Start a ReplicaSet using image `docker.paddlepaddle.org/book` in kubernetes cluster and add an ingress endpoint when user first enters the notebook page.


ingress => Ingress

wangkuiyi · 2017-09-30T17:30:02Z

doc/design/web.md

+
+```python
+sess = paddle.framework.remote_session(
+    topology=block,


What does this example program mean? Is it intended to run a block? I am not sure if our API design could generate a block which is assignable to the topology parameter. Basically, our API is designed to generate a ProgramDesc protobuf message that includes a repeated field of BlockDesc messages, as described here https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md.

Thanks, will just use pseudo code from https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/refactor/session.md

wangkuiyi · 2017-09-30T17:31:35Z

doc/design/web.md

+
+After this, there will be a job description and perfomance monitoring pages able to view at "Job Dashboard"
+
+## Job Dashboard


Which program would serve this job dashboard? As it is per-job, it seems that the master process of a job should serve it. If so, it could be part of the PaddleBoard.

No, job dashboard will list all jobs for the current user. Job dashboard is just one web page simply calls Kubernetes API to get the job list.

wangkuiyi · 2017-09-30T17:34:48Z

doc/design/web.md

+- Upload/Download page
+- file sharing page
+
+## Paddle Board


I didn't expect draw_board functions calls in user programs. I am not sure how configurable the TensorBoard is, but in my mind, PaddleBoard just needs to be able to present outputs from Evaluator operators aggregated/accumulated over minibatches.

If not inserting function calls in user programs, we need to automatically find out which variable represents the cost and evaluator operator by default and draw the variable value to the web page. I'm not sure how to do that for now.

Here is a short example of how TensorBoard config the metrics, using tf.summary. User explicitly specify values to output for drawing.

wangkuiyi · 2017-09-30T17:37:21Z

doc/design/web.md

+
+Calling `draw_board` will output graph files on the distributed storage, and then the web page can load the data and refresh the graph.
+
+## Serving


It seems that a serving job is different from a training job in that the former doesn't have a master process. If so, each process in a serving job needs to be able to present its own metrics, and there is no chance for them to present a PaddleBoard?

I'm not sure what metrics to display when running inference(serving), the neural network configuration may not define cost functions, and there's no label to evaluate the result. Metrics like QPS(queries per second) is more like "monitoring" but not PaddleBoard.

wangkuiyi · 2017-09-30T17:37:34Z

doc/design/web.md

+1. inference network configuration in `.proto` format, or user can also define the network in Python in the webpage.
+1. number of CPU/GPU resource in total to use for serving the model, the more resource there is, the more concurrent calls can be served.
+
+After cliking the "Langch" button, a "kubernetes deployment" will be created to serve the model. The current serving instances will be listed at the current page.


Where is the "Launch" button?

Updated following comments.

helinwang · 2017-10-11T17:58:51Z

doc/design/web.md

+- Account Management
+    - Registration, send email to inform if registration succeeded
+    - Account Login/Logout
+    - Password changing, find back


I think we don't need "find back", maybe change to "resetting". Since we probably would only store a hashed password.

helinwang · 2017-10-11T18:11:20Z

doc/design/web.md

+    - Registration, send email to inform if registration succeeded
+    - Account Login/Logout
+    - Password changing, find back
+    - Download SSL keys


What are the SSL key for? I thought currently authentication is done via token?

helinwang · 2017-10-11T18:18:27Z

doc/design/web.md

+- Datasets
+    - Public Dataset viewing
+    - Upload/Download private datasets
+    - Share datasets


Maybe need to be more specific: Does it mean can share the dataset to anyone by a link, or just set dataset visible to certain group (similar to the unix read file permission).

helinwang · 2017-10-11T18:21:34Z

doc/design/web.md

+
+## Account Management
+
+Account management page is designed to satisfy multi-tenant use cases. One account should have a unique account ID for login, and this account owns one access key to one unique [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) cluster. Multiple users can log in to this account ID and operate jobs and data files. The only "master user" can do modifications like increase quota or manage account settings.


"log in to this account ID" or log in to its own account which belongs to the group. If so, "master user" could be "group owner".

helinwang · 2017-10-11T18:28:50Z

doc/design/web.md

+my_metric = my_metric_graph(output, label)
+my_metric_value = output
+
+draw_board(cost, evaluator)


I think draw_board should take only one variable that returns a scalar, and a optional name. E.g.,

draw_board(evaluator, "evaluate result")

helinwang · 2017-10-11T18:31:09Z

doc/design/web.md

+
+1. model `tar.gz` files to the cloud.
+1. inference network configuration in `.proto` format or user can also define the network in Python in the web page.
+1. number of CPU/GPU resource in total to use for serving the model, the more resource there is, the more concurrent calls can be served.


Should we change "number of CPU/GPU resource" to number of instances and CPU / Mem / GPU per instance. Otherwise it's hard for us to figure out how many instances to run (we don't know the model property and user's serving requirement).

Agree with @helinwang , additional, we can caculate the total resources usage on the web site and display them on the web site.

Yancey0623 · 2017-10-12T03:07:56Z

doc/design/web.md

+    - Performance Monitoring
+    - Quota Monitoring
+- Datasets
+    - Public Dataset viewing


Dataset => dataset

Yancey0623 · 2017-10-12T03:09:39Z

doc/design/web.md

+- Serving
+    - Submit serving instances
+    - Deactivate serving
+    - Serving performance monitoring


I think we also need a feature: scale serving instances, and we can use HPA to implement auto-scaling.

Yancey0623 · 2017-10-12T03:12:40Z

doc/design/web.md

+
+<img src="pictures/notebook.png" width="500px" align="center">
+
+Users can write a program in python in the web page and save their programs, which will be saved at cloud storage. Users also can run a script like below to submit a cluster training job:


python => Python

Yancey0623 · 2017-10-12T03:15:38Z

doc/design/web.md

+
+A web page containing a table to list jobs satisfying user's filters. The user can only list jobs that were submitted by themselves.
+
+| jobname | start time | age      | success | fails | actions |


Maybe we need mor information on the Job list, such as PS_READY, PS_TOTAL, TRAINER_READY, TRAINER_TOTAL.

Yancey0623 · 2017-10-12T03:16:42Z

doc/design/web.md

+
+Datasets and Models are quite the same, both like a simple file management and sharing service.
+
+- file listing and viewing page


file => File

Yancey0623 · 2017-10-12T03:19:39Z

doc/design/web.md

+
+## Datasets and Models
+
+Datasets and Models are quite the same, both like a simple file management and sharing service.


Maybe we can supplement more information about the file sharing service.Such as we can share files between users, namespaces or just a publish link?

Yancey0623 · 2017-10-12T03:23:06Z

doc/design/web.md

+
+Click the "Launch" button in this web page will pop up a modal dialogue to configure the job:
+
+1. model `tar.gz` files to the cloud.


Upper the first letter, the same as below.
model tar.gz files to the cloud
=>
The path of model files with suffix tar.ge on the Cloud.

Yancey0623 · 2017-10-12T03:28:31Z

doc/design/web.md

+
+1. model `tar.gz` files to the cloud.
+1. inference network configuration in `.proto` format or user can also define the network in Python in the web page.
+1. number of CPU/GPU resource in total to use for serving the model, the more resource there is, the more concurrent calls can be served.


Agree with @helinwang , additional, we can caculate the total resources usage on the web site and display them on the web site.

typhoonzero · 2017-11-01T06:12:31Z

Closing, will reopen if we are going to do this work.

typhoonzero added 6 commits September 12, 2017 14:32

publish files

5b95916

publish files

04ab2f5

fix travis error

63f1c95

Merge branch 'develop' of https://github.com/PaddlePaddle/cloud into …

f227e48

…develop

Merge branch 'develop' of https://github.com/PaddlePaddle/cloud into …

34bbe7a

…develop

add web design

9ce5c2d

typhoonzero changed the title ~~Web desing~~ Web design Sep 30, 2017

typhoonzero changed the title ~~Web design~~ Paddle cloud web features design Sep 30, 2017

wangkuiyi reviewed Sep 30, 2017

View reviewed changes

follow commonts

46c5db8

helinwang reviewed Oct 11, 2017

View reviewed changes

Yancey0623 reviewed Oct 12, 2017

View reviewed changes

Yancey0623 mentioned this pull request Oct 13, 2017

Do we need paddlectl client once we have the kubernetes custom controller? #383

Open

typhoonzero closed this Nov 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paddle cloud web features design #378

Paddle cloud web features design #378

typhoonzero commented Sep 30, 2017

wangkuiyi left a comment

wangkuiyi Sep 30, 2017

wangkuiyi Sep 30, 2017

wangkuiyi Sep 30, 2017

wangkuiyi Sep 30, 2017

wangkuiyi Sep 30, 2017

typhoonzero Oct 9, 2017

wangkuiyi Sep 30, 2017

typhoonzero Oct 9, 2017

wangkuiyi Sep 30, 2017

typhoonzero Oct 9, 2017

wangkuiyi Sep 30, 2017

typhoonzero Oct 9, 2017

wangkuiyi Sep 30, 2017

typhoonzero Oct 9, 2017

helinwang Oct 11, 2017

helinwang Oct 11, 2017

helinwang Oct 11, 2017

helinwang Oct 11, 2017

helinwang Oct 11, 2017

helinwang Oct 11, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

Yancey0623 Oct 12, 2017

typhoonzero commented Nov 1, 2017


		## Account Management

		I'll skip this section because it is a design that almost every website need.


		## Jupiter Notebook

		Start a ReplicaSet using image `docker.paddlepaddle.org/book` in kubernetes cluster and add an ingress endpoint when user first enters the notebook page.


		After this, there will be a job description and perfomance monitoring pages able to view at "Job Dashboard"

		## Job Dashboard


		Calling `draw_board` will output graph files on the distributed storage, and then the web page can load the data and refresh the graph.

		## Serving


		## Account Management

		Account management page is designed to satisfy multi-tenant use cases. One account should have a unique account ID for login, and this account owns one access key to one unique [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) cluster. Multiple users can log in to this account ID and operate jobs and data files. The only "master user" can do modifications like increase quota or manage account settings.


		<img src="pictures/notebook.png" width="500px" align="center">

		Users can write a program in python in the web page and save their programs, which will be saved at cloud storage. Users also can run a script like below to submit a cluster training job:


		A web page containing a table to list jobs satisfying user's filters. The user can only list jobs that were submitted by themselves.

		\| jobname \| start time \| age \| success \| fails \| actions \|


		Datasets and Models are quite the same, both like a simple file management and sharing service.

		- file listing and viewing page


		## Datasets and Models

		Datasets and Models are quite the same, both like a simple file management and sharing service.


		Click the "Launch" button in this web page will pop up a modal dialogue to configure the job:

		1. model `tar.gz` files to the cloud.

Paddle cloud web features design #378

Paddle cloud web features design #378

Conversation

typhoonzero commented Sep 30, 2017

wangkuiyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Nov 1, 2017