Skip to content

Commit

Permalink
Rewrote doc on GPU support in Kubernetes.
Browse files Browse the repository at this point in the history
  • Loading branch information
rohitagarwal003 committed Dec 22, 2017
1 parent eda0652 commit 9dc9983
Showing 1 changed file with 159 additions and 110 deletions.
269 changes: 159 additions & 110 deletions docs/tasks/manage-gpus/scheduling-gpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,154 +4,203 @@ approvers:
title: Schedule GPUs
---

{% capture overview %}

Kubernetes includes **experimental** support for managing NVIDIA GPUs spread across nodes.
This page describes how users can consume GPUs and the current limitations.
Kubernetes includes **experimental** support for managing NVIDIA GPUs spread
across nodes. The support for NVIDIA GPUs was added in v1.6 and has gone through
multiple backwards incompatible iterations. This page describes how users can
consume GPUs across different Kubernetes versions and the current limitations.

## v1.6 and v1.7
To enable GPU support in 1.6 and 1.7, a special **alpha** feature gate
`Accelerators` has to be set to true across the system:
`--feature-gates="Accelerators=true"`. It also requires using the Docker
Engine as the container runtime.

Further, the Kubernetes nodes have to be pre-installed with NVIDIA drivers.
Kubelet will not detect NVIDIA GPUs otherwise.

When you start Kubernetes components after all the above conditions are true,
Kubernetes will expose `alpha.kubernetes.io/nvidia-gpu` as a schedulable
resource.

You can consume these GPUs from your containers by requesting for
`alpha.kubernetes.io/nvidia-gpu` just like you request `cpu` or `memory`.
However, there are some limitations in how you specify the resource requirements
when using GPUs:
- GPUs are only supposed to be specified in the `limits` section, which means:
* You can specify GPU `limits` without specifying `requests` because
Kubernetes will use the limit as the request value by default.
* You can specify GPU in both `limits` and `requests` but these two values
must equal.
* You cannot specify GPU `requests` without specifying `limits`.
- Containers (and pods) do not share GPUs. There's no overcommitting of GPUs.
- Each container can request one or more GPUs. It is not possible to request a
fraction of a GPU.

{% endcapture %}
When using `alpha.kubernetes.io/nvidia-gpu` as the resource, you also have to
mount host directories containing NVIDIA libraries (libcuda.so, libnvidia.so
etc.) to the container.

{% capture prerequisites %}
Here's an example:

1. Kubernetes nodes have to be pre-installed with Nvidia drivers. Kubelet will not detect Nvidia GPUs otherwise. Try to re-install Nvidia drivers if kubelet fails to expose Nvidia GPUs as part of Node Capacity. After installing the driver, run `nvidia-docker-plugin` to confirm that all drivers have been loaded.
2. A special **alpha** feature gate `Accelerators` has to be set to true across the system: `--feature-gates="Accelerators=true"`.
3. Nodes must be using `docker engine` as the container runtime.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add-pod
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "gcr.io/google_containers/cuda-vector-add:v0.1"
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1 # requesting 1 GPU
volumeMounts:
- name: "nvidia-libraries"
mountPath: "/usr/local/nvidia/lib64"
volumes:
- name: "nvidia-libraries"
hostPath:
path: "/usr/lib/nvidia-375"
```
The nodes will automatically discover and expose all Nvidia GPUs as a schedulable resource.
The `Accelerators` feature gate and `alpha.kubernetes.io/nvidia-gpu` resource
works on 1.8 and 1.9 as well. It will be deprecated from 1.10 and removed in
1.11.

{% endcapture %}
## v1.8 onwards

{% capture steps %}
From 1.8 onwards, the recommended way to consume GPUs is to use [device
plugins](/docs/concepts/cluster-administration/device-plugins).

## API
To enable GPU support through device plugins, a special **alpha** feature gate
`DevicePlugins` has to be set to true across the system:
`--feature-gates="DevicePlugins=true"`.

Nvidia GPUs can be consumed via container level resource requirements using the resource name `alpha.kubernetes.io/nvidia-gpu`.
Then you have to install NVIDIA drivers on the nodes and run a NVIDIA GPU device
plugin ([see below](#deploying-nvidia-gpu-device-plugin)).

```yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
-
name: gpu-container-1
image: gcr.io/google_containers/pause:2.0
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 2 # requesting 2 GPUs
-
name: gpu-container-2
image: gcr.io/google_containers/pause:2.0
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 3 # requesting 3 GPUs
```
When the above conditions are true, Kubernetes will expose `nvidia.com/gpu` as
a schedulable resource.

You can consume these GPUs from your containers by requesting for
`nvidia.com/gpu` just like you request `cpu` or `memory`.
However, there are some limitations in how you specify the resource requirements
when using GPUs:
- GPUs are only supposed to be specified in the `limits` section, which means:
* You can specify GPU `limits` without specifying `requests` because Kubernetes
will use the limit as the request value by default.
* You can specify GPU in both `limits` and `requests` but these two values must equal.
* You can specify GPU `limits` without specifying `requests` because
Kubernetes will use the limit as the request value by default.
* You can specify GPU in both `limits` and `requests` but these two values
must equal.
* You cannot specify GPU `requests` without specifying `limits`.
- Containers (and pods) do not share GPUs.
- Each container can request one or more GPUs.
- It is not possible to request a portion of a GPU.
- Nodes are expected to be homogenous, i.e. run the same GPU hardware.

If your nodes are running different versions of GPUs, then use Node Labels and Node Selectors to schedule pods to appropriate GPUs.
Following is an illustration of this workflow:
- Containers (and pods) do not share GPUs. There's no overcommitting of GPUs.
- Each container can request one or more GPUs. It is not possible to request a
fraction of a GPU.

As part of your Node bootstrapping, identify the GPU hardware type on your nodes and expose it as a node label.
Unlike with `alpha.kubernetes.io/nvidia-gpu`, when using `nvidia.com/gpu` as
the resource, you don't have to mount any special directories in your pod
specs. The device plugin is expected to inject them automatically in the
container.

```shell
NVIDIA_GPU_NAME=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g')
source /etc/default/kubelet
KUBELET_OPTS="$KUBELET_OPTS --node-labels='alpha.kubernetes.io/nvidia-gpu-name=$NVIDIA_GPU_NAME'"
echo "KUBELET_OPTS=$KUBELET_OPTS" > /etc/default/kubelet
```

Specify the GPU types a pod can use via [Node Affinity](/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) rules.
Here's an example:

```yaml
kind: pod
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/affinity: >
{
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "alpha.kubernetes.io/nvidia-gpu-name",
"operator": "In",
"values": ["Tesla K80", "Tesla P100"]
}
]
}
]
}
}
}
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
-
name: gpu-container-1
- name: cuda-vector-add
image: "gcr.io/google_containers/cuda-vector-add:v0.1"
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 2
nvidia.com/gpu: 1 # requesting 1 GPU
```

### Deploying NVIDIA GPU device plugin

There are currently two device plugin implementations for NVIDIA GPUs:

#### Official NVIDIA GPU device plugin

The [official NVIDIA GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin)
has the following requirements:
- Kubernetes nodes have to be pre-installed with NVIDIA drivers.
- Kubernetes nodes have to be pre-installed with [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker)
- nvidia-container-runtime configured as the [default runtime](https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime)
for docker instead of runc.
- NVIDIA drivers ~= 361.93

To deploy the NVIDIA device plugin once your cluster is running and above
requirements are satisfied:

```
# For Kubernetes v1.8
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.8/nvidia-device-plugin.yml
# For Kubernetes v1.9
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.9/nvidia-device-plugin.yml
```

#### NVIDIA GPU device plugin used by GKE/GCE

The [NVIDIA GPU device plugin used by GKE/GCE](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
doesn't require using nvidia-docker and should work with any CRI compatible container
runtime. It's supported on [COS](https://cloud.google.com/container-optimized-os/)
and has experimental support for Ubuntu from 1.9 onwards.

On your 1.9 cluster, you can use the following commands to install NVIDIA drivers and device plugin:

```
# Install NVIDIA drivers on COS:
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/daemonset.yaml
This will ensure that the pod will be scheduled to a node that has a `Tesla K80` or a `Tesla P100` Nvidia GPU.
# Install NVIDIA drivers on Ubuntu:
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/ubuntu/daemonset.yaml
### Warning
# Install the device plugin:
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/release-1.9/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
```

The API presented here **will change** in an upcoming release to better support GPUs, and hardware accelerators in general, in Kubernetes.
## Heterogeneous nodes containging different types of NVIDIA GPUs

## Access to CUDA libraries
If your nodes are running different types of NVIDIA GPUs, then you can use [Node
Labels and Node Selectors](/docs/tasks/configure-pod-container/assign-pods-nodes/)
to schedule pods to appropriate GPUs.

As of now, CUDA libraries are expected to be pre-installed on the nodes.
For example:

To mitigate this, you can copy the libraries to a more permissive folder in ``/var/lib/`` or change the permissions directly. (Future releases will automatically perform this operation)
```shell
# Label your nodes with the accelerator type.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80
kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
```

Pods can access the libraries using `hostPath` volumes.
Specify the GPU type in the pod spec:

```yaml
kind: Pod
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: gpu-container-1
image: gcr.io/google_containers/pause:2.0
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- mountPath: /usr/local/nvidia/bin
name: bin
- mountPath: /usr/lib/nvidia
name: lib
volumes:
- hostPath:
path: /usr/lib/nvidia-375/bin
name: bin
- hostPath:
path: /usr/lib/nvidia-375
name: lib
- name: cuda-vector-add
image: "gcr.io/google_containers/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 etc.
```

## Future
This will ensure that the pod will be scheduled to a node that has the GPU type
you specified.

- Support for hardware accelerators is in its early stages in Kubernetes.
- GPUs and other accelerators will soon be a native compute resource across the system.
## Future
- Support for hardware accelerators is Kubernetes is still in alpha.
- Better APIs will be introduced to provision and consume accelerators in a scalable manner.
- Kubernetes will automatically ensure that applications consuming GPUs get the best possible performance.
- Key usability problems like access to CUDA libraries will be addressed.

{% endcapture %}

{% include templates/task.md %}

0 comments on commit 9dc9983

Please sign in to comment.