Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding the gateway_kernel_init_sequence_diagram.png to the docs #1039

Conversation

telamonian
Copy link
Contributor

I've been working on and off the last couple of weeks on a sequence diagram showing all of the steps that get triggered when a frontend requests a kernel init via a gateway. The latest version is TeX based and open source friendly, so I figured I'd contribute it to the JEG docs. I've include the .tex source and a corresponding Makefile. This way, if any corrections or updates need to be made, it should be easy to do so (well, you'll still need a texlive environment).

Not sure if all of the details are correct, and I have no idea where exactly to put it in the docs, so I'm opening this PR to start a dialogue about that.

- also including the .tex source and a Makefile, in case any of the
sequence diagram details need to be corrected or updated in the future
@kevin-bates
Copy link
Member

Hi Max - this is great and will be useful.

There's a lot that happens when the browser issues the request to /api/sessions and I wonder if we should include the gateway package in here. I think we could show that api/sessions in jupyter_server will wind up issuing a POST to /api/kernels on the Gateway, which then, essentially creates a namespaced pod.

I wouldn't be opposed to generalizing things in terms of process proxies (later to be provisioners) rather than having this specific to Kubernetes. I suppose there's an entire sequence diagram there for the launch and discovery portions as well.

Regarding location in the docs, I happen to have the hood up on the docs as we speak and I'm doing a complete reorganization/overhaul to be role-based (users, operators, developers, contributors). That effort is just starting on this branch: https://github.com/kevin-bates/enterprise_gateway/tree/reorg-docs. This screenshot gives an idea:
image

I'm wondering if this would go in the Contributors section (near the System Architecture discussion) or as part of the Troubleshooting section (currently in the Operators section) although I'm thinking there might be a Troubleshooting section in each role or have a general Troubleshooting Guide where different topics are equated to one of the four roles - since different roles overlap. At any rate, it might be best to hold this merge until the new organization is in place.

Regarding the makefile, since it's all in the docs hierarchy, please add the necessary targets into docs/Makefile. We can then publish "forwarding targets" in the main Makefile at the root. I have not changed docs/Makefile very much so that shouldn't cause merge heartburn.

At any rate, this is useful and appreciated! Thank you.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@telamonian - this is great - thank you!

I've made a few comments, one of which pertains to the a possible merge headache, so it might be best to drop that commit unless I'm missing the inclusion of this diagram in those changes.

We can work on placement once we get the diagram (and its build) settled. I hope by then I might have the docs reworked and we could then update to that.

and the underlying kernel. In addition to the 5 ports, is an IP address, a key, and a signature scheme
indicator used to interpret the key. These eight pieces of information are conveyed to the kernel via a
json file, known as the connection file.
The first component is a set of five zero-MQ ports used to convey the Jupyter protocol between the Notebook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any changes in this file besides trailing spaces? If not, I suggest we discard these for now since the docs are getting completely reorganized and this will present unnecessary merge hassles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one actual change is here:

![Kernel Process Lanuch Sequence Diagram](images/gateway_kernel_init_sequence_diagram.png)

i can revert the whitespace

@@ -0,0 +1,13 @@
fname=gateway_kernel_init_sequence_diagram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we drive this from docs/Makefile? I think I recently added a new target for installing requirements and perhaps we add a build_tex_images or tex target there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I figured that was kind of brittle since the odds of a valid texlive (with all the needed packages) existing by default in the doc build env has to be low.

I did just find tectonic. It apparently could be installed as part of the doc build conda env, is a self contained texlive, and pulls needed packages automagically. Seems a bit like overkill for one .png, but it'd probably work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just be sure to leave build instructions somewhere (perhaps commented out in docs/Makefile?).

\begin{sdblock}{kernel init}{}
\begin{call}{browser}{https POST api/sessions}{server}{api/sessions response}
\begin{call}{server}{https POST api/sessions}{gateway}{api/sessions response}
\begin{call}{gateway}{CoreV1API.create\_namespaced\_pod}{kernel}{kernel connection info\footnote{see https://github.com/jupyter/enterprise\_gateway/pull/877}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to make this generic, we could have something like:

Suggested change
\begin{call}{gateway}{CoreV1API.create\_namespaced\_pod}{kernel}{kernel connection info\footnote{see https://github.com/jupyter/enterprise\_gateway/pull/877}}
\begin{call}{gateway}{RemoteProcessProxy.launch\_process}{kernel}{kernel connection info\footnote{see https://github.com/jupyter/enterprise\_gateway/pull/877}}

There's also a discovery step performed by the process-proxy immediately following the response of the connection info. This is what actually determines the IP to use for the zmq ports (in addition to ensuring the kernel is "visible" via the resource manager's API).

I suspect this would need to add a "bar" for the resource manager in order to show this. Reverse engineering this, I suppose this would add something like...

       \newinst[5]{rm}{:resource manager}
...
       \begin{call}{gateway}{RemoteProcessProxy.confirm\_remote\_startup}{rm}{status and host info}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a discovery step performed by the process-proxy immediately following the response of the connection info. This is what actually determines the IP to use for the zmq ports (in addition to ensuring the kernel is "visible" via the resource manager's API).

I suspect this would need to add a "bar" for the resource manager in order to show this. Reverse engineering this, I suppose this would add something like...

       \newinst[5]{rm}{:resource manager}
...
       \begin{call}{gateway}{RemoteProcessProxy.confirm\_remote\_startup}{rm}{status and host info}

@kevin you're going to have to explain this one a little better. Does information also end up flowing back the frontend from this, or just to the gateway? What's a resource manager in this context? For eg the k8s case, is the resource manager the k8s API itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does information also end up flowing back the frontend from this, or just to the gateway?

No. This phase is only between the remote-process-proxy (EG) and the RM.

What's a resource manager in this context? For eg the k8s case, is the resource manager the k8s API itself?

Yes, for K8s it's the k8s API - and issued by the KubernetesProcessProxy. Same thing for Hadoop YARN (via the YarnClusterProcessProxy). We use a yarn python API that issues REST queries against the YARN RM. Both of these return state and host information. We then set the IP in the connection information from this host info.

telamonian and others added 2 commits February 10, 2022 19:30
@kevin-bates
Copy link
Member

kevin-bates commented Feb 18, 2022

Hi Max, I played around with sphinxcontrib-mermaid yesterday and came up with this, which can be expressed directly in the markdown:
Screen Shot 2022-02-18 at 8 11 39 AM
The problem right now is that it leaves white space above (and below) the image that is the same "length" as the diagram instructions. I've opened this issue on the repo hoping I'm not doing something correctly.

Here are the instructions. I went ahead and tried to express the resource manager interaction and couch things in more generic terms. If you want to add these into your image, that would be fine. I would, however, like to have these be as dynamic as possible - ultimately, and baking the instructions into the docs seems like a good way to go. Perhaps we could live with the whitespace better if we placed each diagram into its own file and linked to that file until the whitespace issue can be resolved?

This would be wrapped in {mermaid} directive.

    sequenceDiagram
        participant JupyterLab
        participant JupyterServer
        participant EnterpriseGateway
        participant ProcessProxy
        participant Kernel
        participant ResourceManager
        Note left of JupyterLab: fetch kernelspecs
        JupyterLab->>JupyterServer: https GET api/kernelspecs
        JupyterServer->>EnterpriseGateway: https GET api/kernelspecs
        EnterpriseGateway-->>JupyterServer: api/kernelspecs response
        JupyterServer-->>JupyterLab: api/kernelspecs response

        Note left of JupyterLab: kernel init
        JupyterLab->>JupyterServer: https POST api/sessions
        JupyterServer->>EnterpriseGateway: https POST api/kernels
        EnterpriseGateway->>ProcessProxy: launch_process()
        ProcessProxy->>Kernel: launch kernel
        ProcessProxy->>ResourceManager: confirm startup
        Kernel-->>ProcessProxy: connection info
        ResourceManager-->>ProcessProxy: state & host info
        ProcessProxy-->>EnterpriseGateway: complete connection info
        EnterpriseGateway->>Kernel: TCP socket requests
        Kernel-->>EnterpriseGateway: TCP socket handshakes
        EnterpriseGateway-->>JupyterServer: api/kernels response
        JupyterServer-->>JupyterLab: api/sessions response

        JupyterLab->>JupyterServer: ws GET api/kernels
        JupyterServer->>EnterpriseGateway: ws GET api/kernels
        EnterpriseGateway-->>JupyterServer: websocket upgrade response
        JupyterServer-->>JupyterLab: websocket upgrade response

Another issue is that you can see the diagram instructions on refresh. I'm curious if you might have some ideas on what's going on with both the white space and refresh.
DiagramWhitespace

@kevin-bates
Copy link
Member

Note that the aforementioned issue with mermaid sequence diagrams has a reasonable workaround which I've confirmed works.

Rather than continue with this pull request, I'm going to open a new one (co-authored with @telamonian) and go with the mermaid/markdown approach since it seems far less cumbersome from a build perspective.

@kevin-bates
Copy link
Member

Closing in favor of #1132.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants