Rewrote doc on GPU support in Kubernetes.

kubernetes · Dec 22, 2017 · 9dc9983 · 9dc9983
1 parent eda0652
commit 9dc9983
Showing 1 changed file with 159 additions and 110 deletions.
diff --git a/docs/tasks/manage-gpus/scheduling-gpus.md b/docs/tasks/manage-gpus/scheduling-gpus.md
@@ -4,154 +4,203 @@ approvers:
 title: Schedule GPUs
 ---
 
-{% capture overview %}
-
-Kubernetes includes **experimental** support for managing NVIDIA GPUs spread across nodes.
-This page describes how users can consume GPUs and the current limitations.
+Kubernetes includes **experimental** support for managing NVIDIA GPUs spread
+across nodes. The support for NVIDIA GPUs was added in v1.6 and has gone through
+multiple backwards incompatible iterations. This page describes how users can
+consume GPUs across different Kubernetes versions and the current limitations.
+
+## v1.6 and v1.7
+To enable GPU support in 1.6 and 1.7, a special **alpha** feature gate
+`Accelerators` has to be set to true across the system:
+`--feature-gates="Accelerators=true"`. It also requires using the Docker
+Engine as the container runtime.
+
+Further, the Kubernetes nodes have to be pre-installed with NVIDIA drivers.
+Kubelet will not detect NVIDIA GPUs otherwise.
+
+When you start Kubernetes components after all the above conditions are true,
+Kubernetes will expose `alpha.kubernetes.io/nvidia-gpu` as a schedulable
+resource.
+
+You can consume these GPUs from your containers by requesting for
+`alpha.kubernetes.io/nvidia-gpu` just like you request `cpu` or `memory`.
+However, there are some limitations in how you specify the resource requirements
+when using GPUs:
+- GPUs are only supposed to be specified in the `limits` section, which means:
+  * You can specify GPU `limits` without specifying `requests` because
+    Kubernetes will use the limit as the request value by default.
+  * You can specify GPU in both `limits` and `requests` but these two values
+    must equal.
+  * You cannot specify GPU `requests` without specifying `limits`.
+- Containers (and pods) do not share GPUs. There's no overcommitting of GPUs.
+- Each container can request one or more GPUs. It is not possible to request a
+  fraction of a GPU.
 
-{% endcapture %}
+When using `alpha.kubernetes.io/nvidia-gpu` as the resource, you also have to
+mount host directories containing NVIDIA libraries (libcuda.so, libnvidia.so
+etc.) to the container.
 
-{% capture prerequisites %}
+Here's an example:
 
-1. Kubernetes nodes have to be pre-installed with Nvidia drivers. Kubelet will not detect Nvidia GPUs otherwise. Try to re-install Nvidia drivers if kubelet fails to expose Nvidia GPUs as part of Node Capacity. After installing the driver, run `nvidia-docker-plugin` to confirm that all drivers have been loaded.
-2. A special **alpha** feature gate `Accelerators` has to be set to true across the system: `--feature-gates="Accelerators=true"`.
-3. Nodes must be using `docker engine` as the container runtime.
+```yaml
+apiVersion: v1
+kind: Pod
+metadata:
+  name: cuda-vector-add-pod
+spec:
+  restartPolicy: OnFailure
+  containers:
+    - name: cuda-vector-add
+      image: "gcr.io/google_containers/cuda-vector-add:v0.1"
+      resources:
+        limits:
+          alpha.kubernetes.io/nvidia-gpu: 1 # requesting 1 GPU
+      volumeMounts:
+        - name: "nvidia-libraries"
+          mountPath: "/usr/local/nvidia/lib64"
+  volumes:
+    - name: "nvidia-libraries"
+      hostPath:
+        path: "/usr/lib/nvidia-375"
+```
 
-The nodes will automatically discover and expose all Nvidia GPUs as a schedulable resource.
+The `Accelerators` feature gate and `alpha.kubernetes.io/nvidia-gpu` resource
+works on 1.8 and 1.9 as well. It will be deprecated from 1.10 and removed in
+1.11.
 
-{% endcapture %}
+## v1.8 onwards
 
-{% capture steps %}
+From 1.8 onwards, the recommended way to consume GPUs is to use [device
+plugins](/docs/concepts/cluster-administration/device-plugins).
 
-## API
+To enable GPU support through device plugins, a special **alpha** feature gate
+`DevicePlugins` has to be set to true across the system:
+`--feature-gates="DevicePlugins=true"`.
 
-Nvidia GPUs can be consumed via container level resource requirements using the resource name `alpha.kubernetes.io/nvidia-gpu`.
+Then you have to install NVIDIA drivers on the nodes and run a NVIDIA GPU device
+plugin ([see below](#deploying-nvidia-gpu-device-plugin)).
 
-```yaml
-apiVersion: v1
-kind: Pod 
-metadata:
-  name: gpu-pod
-spec: 
-  containers: 
-    - 
-      name: gpu-container-1
-      image: gcr.io/google_containers/pause:2.0
-      resources: 
-        limits: 
-          alpha.kubernetes.io/nvidia-gpu: 2 # requesting 2 GPUs
-    -
-      name: gpu-container-2
-      image: gcr.io/google_containers/pause:2.0
-      resources: 
-        limits: 
-          alpha.kubernetes.io/nvidia-gpu: 3 # requesting 3 GPUs
-```
+When the above conditions are true, Kubernetes will expose `nvidia.com/gpu` as
+a schedulable resource.
 
+You can consume these GPUs from your containers by requesting for
+`nvidia.com/gpu` just like you request `cpu` or `memory`.
+However, there are some limitations in how you specify the resource requirements
+when using GPUs:
 - GPUs are only supposed to be specified in the `limits` section, which means:
-  * You can specify GPU `limits` without specifying `requests` because Kubernetes
-    will use the limit as the request value by default.
-  * You can specify GPU in both `limits` and `requests` but these two values must equal.
+  * You can specify GPU `limits` without specifying `requests` because
+    Kubernetes will use the limit as the request value by default.
+  * You can specify GPU in both `limits` and `requests` but these two values
+    must equal.
   * You cannot specify GPU `requests` without specifying `limits`.
-- Containers (and pods) do not share GPUs.
-- Each container can request one or more GPUs.
-- It is not possible to request a portion of a GPU.
-- Nodes are expected to be homogenous, i.e. run the same GPU hardware.
-
-If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs.
-Following is an illustration of this workflow:
+- Containers (and pods) do not share GPUs. There's no overcommitting of GPUs.
+- Each container can request one or more GPUs. It is not possible to request a
+  fraction of a GPU.
 
-As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.
+Unlike with `alpha.kubernetes.io/nvidia-gpu`, when using `nvidia.com/gpu` as
+the resource, you don't have to mount any special directories in your pod
+specs. The device plugin is expected to inject them automatically in the
+container.
 
-```shell
-NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g')
-source /etc/default/kubelet
-KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
-echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet
-```
-
-Specify the GPU types a pod can use via [Node Affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) rules.
+Here's an example:
 
 ```yaml
-kind: pod
 apiVersion: v1
+kind: Pod
 metadata:
-  annotations:
-    scheduler.alpha.kubernetes.io/affinity: >
-      {
-        "nodeAffinity": {
-          "requiredDuringSchedulingIgnoredDuringExecution": {
-            "nodeSelectorTerms": [
-              {
-                "matchExpressions": [
-                  {
-                    "key": "alpha.kubernetes.io/nvidia-gpu-name",
-                    "operator": "In",
-                    "values": ["Tesla K80", "Tesla P100"]
-                  }
-                ]
-              }
-            ]
-          }
-        }
-      }
+  name: cuda-vector-add
 spec:
+  restartPolicy: OnFailure
   containers:
-    -
-      name: gpu-container-1
+    - name: cuda-vector-add
+      image: "gcr.io/google_containers/cuda-vector-add:v0.1"
       resources:
         limits:
-          alpha.kubernetes.io/nvidia-gpu: 2
+          nvidia.com/gpu: 1 # requesting 1 GPU
+```
+
+### Deploying NVIDIA GPU device plugin
+
+There are currently two device plugin implementations for NVIDIA GPUs:
+
+#### Official NVIDIA GPU device plugin
+
+The [official NVIDIA GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin)
+has the following requirements:
+- Kubernetes nodes have to be pre-installed with NVIDIA drivers.
+- Kubernetes nodes have to be pre-installed with [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker)
+- nvidia-container-runtime configured as the [default runtime](https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime)
+  for docker instead of runc.
+- NVIDIA drivers ~= 361.93
+
+To deploy the NVIDIA device plugin once your cluster is running and above
+requirements are satisfied:
+
+```
+# For Kubernetes v1.8
+kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.8/nvidia-device-plugin.yml
+
+# For Kubernetes v1.9
+kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
+```
+
+#### NVIDIA GPU device plugin used by GKE/GCE
+
+The [NVIDIA GPU device plugin used by GKE/GCE](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
+doesn't require using nvidia-docker and should work with any CRI compatible container
+runtime. It's supported on [COS](https://cloud.google.com/container-optimized-os/)
+and has experimental support for Ubuntu from 1.9 onwards.
+
+On your 1.9 cluster, you can use the following commands to install NVIDIA drivers and device plugin:
+
 ```
+# Install NVIDIA drivers on COS:
+kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/daemonset.yaml
 
-This will ensure that the pod will be scheduled to a node that has a `Tesla K80` or a `Tesla P100` Nvidia GPU.
+# Install NVIDIA drivers on Ubuntu:
+kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/ubuntu/daemonset.yaml
 
-### Warning
+# Install the device plugin:
+kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.9/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
+```
 
-The API presented here **will change** in an upcoming release to better support GPUs, and hardware accelerators in general, in Kubernetes.
+## Heterogeneous nodes containging different types of NVIDIA GPUs
 
-## Access to CUDA libraries
+If your nodes are running different types of NVIDIA GPUs, then you can use [Node
+Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
+to schedule pods to appropriate GPUs.
 
-As of now, CUDA libraries are expected to be pre-installed on the nodes.
+For example:
 
-To mitigate this, you can copy the libraries to a more permissive folder in ``/var/lib/`` or change the permissions directly. (Future releases will automatically perform this operation)
+```shell
+# Label your nodes with the accelerator type.
+kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
+kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
+```
 
-Pods can access the libraries using `hostPath` volumes.
+Specify the GPU type in the pod spec:
 
 ```yaml
-kind: Pod
 apiVersion: v1
+kind: Pod
 metadata:
-  name: gpu-pod
+  name: cuda-vector-add
 spec:
+  restartPolicy: OnFailure
   containers:
-  - name: gpu-container-1
-    image: gcr.io/google_containers/pause:2.0
-    resources:
-      limits:
-        alpha.kubernetes.io/nvidia-gpu: 1
-    volumeMounts:
-    - mountPath: /usr/local/nvidia/bin
-      name: bin
-    - mountPath: /usr/lib/nvidia
-      name: lib
-  volumes:
-  - hostPath:
-      path: /usr/lib/nvidia-375/bin
-    name: bin
-  - hostPath:
-      path: /usr/lib/nvidia-375
-    name: lib
+    - name: cuda-vector-add
+      image: "gcr.io/google_containers/cuda-vector-add:v0.1"
+      resources:
+        limits:
+          nvidia.com/gpu: 1
+  nodeSelector:
+    accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 etc.
 ```
 
-## Future
+This will ensure that the pod will be scheduled to a node that has the GPU type
+you specified.
 
-- Support for hardware accelerators is in its early stages in Kubernetes.
-- GPUs and other accelerators will soon be a native compute resource across the system.
+## Future
+- Support for hardware accelerators is Kubernetes is still in alpha.
 - Better APIs will be introduced to provision and consume accelerators in a scalable manner.
 - Kubernetes will automatically ensure that applications consuming GPUs get the best possible performance.
-- Key usability problems like access to CUDA libraries will be addressed.
-
-{% endcapture %}
-
-{% include templates/task.md %}